Knowledge System
Overview
Section titled “Overview”NODYN’s knowledge system persists information across sessions using local file storage. Knowledge is automatically populated by analyzing agent responses and can be manually managed via tools and CLI commands.
Storage
Section titled “Storage”Context-Scoped Storage
Section titled “Context-Scoped Storage”Knowledge is stored as plain text files, scoped by context:
~/.nodyn/memory/ <contextId>/ # SHA-256 hash of project root (CLI) or explicit ID knowledge.txt # knowledge namespace methods.txt # methods namespace project-state.txt # project-state namespace learnings.txt # learnings namespace global/ # Fallback when no contextId knowledge.txt methods.txt project-state.txt learnings.txt user-<userId>/ # User-specific preferences (when NODYN_USER set) knowledge.txt ...The base directory is ~/.nodyn/memory/. Context ID is generated by resolveContext() — for CLI it uses sha256Short(projectRoot), for other sources (Telegram, Slack, PWA) it uses the explicit context ID.
Load order: Context-scoped knowledge is loaded first. If no context is detected, falls back to global/.
Docker Persistence
Section titled “Docker Persistence”Knowledge lives at /home/nodyn/.nodyn/memory/ inside the container, covered by the ~/.nodyn volume mount:
docker run -it --rm \ -e ANTHROPIC_API_KEY=sk-ant-... \ -v ~/.nodyn:/home/nodyn/.nodyn \ nodynNamespaces
Section titled “Namespaces”Knowledge is organized into 4 namespaces:
| Namespace | Purpose | Examples |
|---|---|---|
knowledge | Key facts, user preferences | ”User prefers TypeScript”, “Project uses ESM” |
methods | Patterns and techniques | ”Use Promise.allSettled for parallel ops” |
project-state | Ongoing project state | ”Currently refactoring auth module” |
learnings | Mistakes and lessons | ”Avoid using any — use unknown instead” |
Auto-Extraction
Section titled “Auto-Extraction”After every completed agent turn, NODYN automatically extracts relevant information using the fast model:
Agent response → maybeUpdate() → extraction → append to namespacesThis is fire-and-forget: it runs asynchronously and never blocks the response. Extraction failures are silently ignored.
The extraction prompt categorizes information into the 4 namespaces. Only namespaces with relevant content are updated. Responses shorter than 50 characters are skipped.
Knowledge in System Prompt
Section titled “Knowledge in System Prompt”Knowledge injection uses the Knowledge Graph:
KnowledgeLayer.retrieve() → HyDE + vector + graph expansion + MMR → <relevant_context> block with entity subgraph
Knowledge Graph Retrieval
Section titled “Knowledge Graph Retrieval”<relevant_context><scope type="user">[knowledge] (92%)User prefers TypeScript over JavaScript.</scope><scope type="context">[knowledge] (85%)Project uses PostgreSQL 16 for JSONB queries.
[methods] (78%)Use Promise.allSettled for parallel operations.</scope><knowledge_graph>Entities: Thomas Weber (person, 5 mentions), acme-shop.ch (organization, 3 mentions)</knowledge_graph></relevant_context>The <relevant_context> block has cache_control: { type: 'ephemeral' } for efficient prompt caching.
Knowledge Graph
Section titled “Knowledge Graph”Architecture
Section titled “Architecture”The Knowledge Graph is an embedded property graph (LadybugDB, Kuzu fork) that stores memories, entities, and their relationships. It provides entity-aware, graph-augmented retrieval.
~/.nodyn/knowledge-graph/ # LadybugDB embedded databaseGraph Schema:
- Entity nodes: persons, organizations, projects, products, concepts, locations — with
canonical_name,aliases[],entity_type,embedding,mention_count - Memory nodes: knowledge entries with
text,namespace,scope,embedding,is_active,superseded_by - Community nodes: clusters of related entities (future use)
- MENTIONS edges: Memory → Entity (which entities a memory references)
- RELATES_TO edges: Entity → Entity (typed: works_for, owns, uses, etc.)
- SUPERSEDES edges: Memory → Memory (contradiction resolution)
- COOCCURS edges: Entity → Entity (co-occurrence frequency)
Retrieval Pipeline
Section titled “Retrieval Pipeline”User query │ ├─ 1. HyDE (optional, Haiku ~$0.001) │ Generate hypothetical answer → embed for better semantic match │ ├─ 2. Multi-signal search (parallel) │ ├─ Vector search (ANN, top-50) ─── 55% weight │ ├─ Full-text search (keywords) ─── 30% weight │ └─ Graph expansion ─────────────── 15% boost │ Query entities → resolve → 1-2 hop → connected memories │ ├─ 3. Scoring: similarity × scope_weight × namespace_decay │ knowledge: 365d half-life, project-state: 21d half-life │ ├─ 4. MMR re-ranking (λ=0.7 relevance, 0.3 diversity) │ └─ 5. Context formatting with entity subgraphStore Pipeline
Section titled “Store Pipeline”memory_store / maybeUpdate() │ ├─ 1. Embed text (multilingual-e5-small, 384d) ├─ 2. Dedup check (cosine > 0.90 → skip) ├─ 3. Contradiction detection (knowledge/learnings only) │ Vector search > 0.80 → heuristic: negation, number, state change │ Contradicted memory → is_active=false, SUPERSEDES edge ├─ 4. Create Memory node in graph ├─ 5. Entity extraction (regex DE/EN, optional Haiku) ├─ 6. Entity resolution (canonical name → alias → create) ├─ 7. Create MENTIONS + RELATES_TO + COOCCURS edges └─ 8. Parallel: append to flat-file (dual-write for debugging)Entity Extraction
Section titled “Entity Extraction”Two-tier approach:
- Tier 1 — Regex (always, zero cost): Persons (
Herr/Frau/Mr. + Name,client/Kunde + Name), Organizations (domain names,Firma/company + Name), Technology (uses/nutzt + Term), Projects (project "Name",org/repo), Locations (in/aus + Place) - Tier 2 — Haiku (~$0.001, optional): Only for
knowledge/methodsnamespace, text > 200 chars, 0 regex entities found. Also extracts relations between entities
Entity Resolution
Section titled “Entity Resolution”Priority: exact canonical match (case-insensitive) → alias match → normalized match → create new entity. Aliases accumulate: “Thomas”, “Herr Weber”, “the client from Bern” all resolve to the same entity.
Contradiction Detection
Section titled “Contradiction Detection”Only for knowledge and learnings namespaces. Finds memories with >0.80 cosine similarity, then applies heuristic checks:
- Negation: “uses X” vs “doesn’t use X” / “nicht mehr”
- Number change: “budget is 5000” vs “budget is 8000”
- State change: “project is active” vs “project is completed”
Contradicted memories: is_active=false, SUPERSEDES edge created. Old memory stays in graph as audit trail but is excluded from retrieval.
Embedding Providers
Section titled “Embedding Providers”OnnxProvider(default) —@huggingface/transformersWASM runtime. Default model:multilingual-e5-small(384d, 100 languages, ~118MB). Configurable viaembedding_modelconfig. Lazy-loads pipeline on first call (~800ms cold start). Auto-downloads model to~/.cache/huggingface/.VoyageProvider— HTTP via Voyage AI (1024 dims). Requiresvoyage_api_key.LocalProvider— Hash-based deterministic (384 dims). Test-only.
Available ONNX models (embedding_model config):
| Model ID | Dimensions | Size | Languages | Use Case |
|---|---|---|---|---|
multilingual-e5-small (default) | 384 | ~118MB | 100 | Best balance: multilingual + fast |
all-minilm-l6-v2 | 384 | ~23MB | English | Fastest cold start, English-only |
bge-m3 | 1024 | ~570MB | 100+ | Highest quality, slowest start |
CLI Commands
Section titled “CLI Commands”/knowledge list # List stored embeddings/knowledge prune # Remove stale or duplicate entriesKnowledge GC
Section titled “Knowledge GC”GC runs automatically every 50 runs:
- Graph GC (
runGraphGc()): Deletes superseded memories, removes orphan entities (not referenced by any active memory) - CLI:
/memory gc [dry]— dry run previews changes without applying
DataStore ↔ Knowledge Graph Bridge
Section titled “DataStore ↔ Knowledge Graph Bridge”Structured data in DataStore tables is automatically linked to the Knowledge Graph:
- On collection create: Table registered as Entity (type:
collection) in graph - On record insert: String fields scanned for entities via regex →
has_data_inrelationships created - On retrieval: When an entity is found in the graph, related DataStore collections are included as hints in the context (e.g., “Thomas has data in: customers (revenue: 5000)”)
- Proactive discovery: Agent suggests creating tables when it notices recurring structured data during collaboration
data_store_insert("customers", [{name: "Thomas", company: "acme-shop.ch"}]) → Entity "Thomas" (person) → has_data_in → "customers" (collection) → Entity "acme-shop.ch" (org) → has_data_in → "customers" (collection)Memory Tools
Section titled “Memory Tools”All memory tools sync with the Knowledge Graph when enabled:
memory_store: Stores content → entities extracted → graph write → flat-file dual-writememory_recall: Reads from flat-file (graph retrieval happens per-turn via system prompt)memory_update: Updates flat-file text → updates graph Memory node text → re-extracts entitiesmemory_delete: Removes from flat-file → deactivates matching Memory nodes in graphmemory_list: Lists flat-file entries by scope/namespacememory_promote: Copies to broader scope (publishes for graph store) → deactivates source in graph
DataStore Tools
Section titled “DataStore Tools”data_store_create: Set up a table with typed columns. Registers collection in graphdata_store_insert: Insert/upsert records. Entities from string fields indexed in graphdata_store_query: Filter, sort, aggregate (sum/avg/count/min/max)data_store_delete: Remove records matching a filter. Requires filter — no bulk deletedata_store_list: Browse tables and schemas
All tools are available to the agent and sub-agents.
CLI Command
Section titled “CLI Command”/memory # Show all knowledge/memory knowledge # Show only the knowledge namespace/memory methods # Show only the methods namespaceDisabling Knowledge
Section titled “Disabling Knowledge”Pass memory: false in EngineConfig:
const engine = new Engine({ memory: false });Implementation Details
Section titled “Implementation Details”Knowledge Graph (primary path)
Section titled “Knowledge Graph (primary path)”KnowledgeLayer(src/core/knowledge-layer.ts) — implementsIKnowledgeLayerfromsrc/types/index.ts. Composes KuzuGraph + EntityExtractor + EntityResolver + ContradictionDetector + RetrievalEngineKuzuGraph(src/core/knowledge-graph.ts) — LadybugDB wrapper. DB at~/.nodyn/knowledge-graph/. Schema: Entity/Memory/Community nodes, MENTIONS/RELATES_TO/SUPERSEDES/COOCCURS edgesRetrievalEngine(src/core/retrieval-engine.ts) — HyDE + vector + graph expansion + MMR.formatContext()produces XML outputEntityExtractor(src/core/entity-extractor.ts) — Tier 1 regex (DE/EN), Tier 2 optional HaikuEntityResolver(src/core/entity-resolver.ts) — Canonical name resolution, alias mergeContradictionDetector(src/core/contradiction-detector.ts) — Heuristic contradiction checks for knowledge/learnings- Wiring: Engine initializes
KnowledgeLayerat startup, routes knowledge retrieval andmemoryStorechannel through it
Flat-file storage (dual-write)
Section titled “Flat-file storage (dual-write)”- Class:
Memory(src/core/memory.ts) - Interface:
IMemory(src/types/index.ts) — includeshasContent()check - Extraction model: Fast tier via beta messages API
- Cache: Unified
Map<string, string>keyed by${scopeType}:${scopeId}:${namespace} - Scope delegation: Base CRUD methods delegate to scoped variants via
_defaultScope() - Namespaces:
ALL_NAMESPACESconstant fromsrc/types/index.ts
Shared infrastructure
Section titled “Shared infrastructure”- Embedding providers:
src/core/embedding.ts—OnnxProvider(model registry: multilingual-e5-small default),VoyageProvider,LocalProvider - Embedding queue: Engine uses bounded concurrency (max 3 parallel) for stores, with failures logged
- Observability:
nodyn:memory:store(every write),nodyn:knowledge:graph(graph events),nodyn:knowledge:entity(entity events) - Docker: HF model cache persisted via
nodyn-hf-cachevolume at/home/nodyn/.cache/huggingface