CozoDB Memory MCP Server
Local-first memory for Claude & AI agents with hybrid search, Graph-RAG, and time-travel – all in a single binary, no cloud, no Docker.
Table of Contents
- Quick Start
- Key Features
- Positioning & Comparison
- Performance & Benchmarks
- Architecture
- Installation
- Start / Integration
- Configuration & Backends
- Data Model
- MCP Tools
- Production Monitoring
- Technical Highlights
- Optional: HTTP API Bridge
- Development
- User Preference Profiling
- Troubleshooting
- Roadmap
- Contributing
- License
Quick Start
Option 1: Install via npm (Recommended)
# Install globally
npm install -g cozo-memory
# Or run directly with npx (no installation needed)
npx cozo-memory
Option 2: Build from Source
git clone https://github.com/tobs-code/cozo-memory
cd cozo-memory
npm install && npm run build
npm run start
Now you can add the server to your MCP client (e.g. Claude Desktop).
Key Features
🔍 Hybrid Search (since v0.7) - Combines semantic search (HNSW), full-text search (FTS), and graph signals via Reciprocal Rank Fusion (RRF)
🕸️ Graph-RAG & Graph-Walking (v1.7/v2.0) - Hierarchical retrieval with community detection and summarization; recursive traversals using optimized Datalog algorithms
🧠 Agentic Retrieval Layer (v2.0) - Auto-routing engine that analyzes query intent via local LLM to select the optimal search strategy (Vector, Graph, or Community)
🧠 Multi-Level Memory (v2.0) - Context-aware memory system with built-in session and task management
🎯 Tiny Learned Reranker (v2.0) - Integrated Cross-Encoder model (ms-marco-MiniLM-L-6-v2) for ultra-precise re-ranking of top search results
🎯 Multi-Vector Support (since v1.7) - Dual embeddings per entity: content-embedding for context, name-embedding for identification
⚡ Semantic Caching (since v0.8.5) - Two-level cache (L1 memory + L2 persistent) with semantic query matching
⏱️ Time-Travel Queries - Version all changes via CozoDB Validity; query any point in history
🔗 Atomic Transactions (since v1.2) - Multi-statement transactions ensuring data consistency
📊 Graph Algorithms (since v1.3/v1.6) - PageRank, Betweenness Centrality, HITS, Community Detection, Shortest Path
🏗️ Hierarchical GraphRAG (v2.0) - Automatic generation of thematic "Community Summaries" using local LLMs to enable global "Big Picture" reasoning
🧹 Janitor Service - LLM-backed automatic cleanup with hierarchical summarization, observation pruning, and automated session compression
🧠 Fact Lifecycle Management (v2.1) - Native "soft-deletion" via CozoDB Validity retraction; invalidated facts are hidden from current views but preserved in history for audit trails
👤 User Preference Profiling - Persistent user preferences with automatic 50% search boost
🔍 Near-Duplicate Detection - Automatic LSH-based deduplication to avoid redundancy
🧠 Inference Engine - Implicit knowledge discovery with multiple strategies
🏠 100% Local - Embeddings via ONNX/Transformers; no external services required
📦 Export/Import (since v1.8) - Export to JSON, Markdown, or Obsidian-ready ZIP; import from Mem0, MemGPT, Markdown, or native format
📄 PDF Support (since v1.9) - Direct PDF ingestion with text extraction via pdfjs-dist; supports file path and content parameters
🕐 Dual Timestamp Format (since v1.9) - All timestamps returned in both Unix microseconds and ISO 8601 format for maximum flexibility
Detailed Features
- Hybrid Search (v0.7 Optimized): Combination of semantic search (HNSW), Full-Text Search (FTS), and graph signals, merged via Reciprocal Rank Fusion (RRF).
- Full-Text Search (FTS): Native CozoDB v0.7 FTS indices with stemming, stopword filtering, and robust query sanitizing (cleaning of
+ - * / \ ( ) ? .) for maximum stability. - Near-Duplicate Detection (LSH): Automatically detects very similar observations via MinHash-LSH (CozoDB v0.7) to avoid redundancy.
- Recency Bias: Older content is dampened in fusion (except for explicit keyword searches), so "currently relevant" appears higher more often.
- Graph-RAG & Graph-Walking (v1.7 Optimized): Advanced retrieval method combining semantic vector seeds with recursive graph traversals. Now uses an optimized Graph-Walking algorithm via Datalog, using HNSW index lookups for precise distance calculations during traversal.
- Multi-Vector Support (v1.7): Each entity now has two specialized vectors:
- Content-Embedding: Represents the content context (observations).
- Name-Embedding: Optimized for identification via name/label. This significantly improves accuracy when entering graph walks.
- Semantic & Persistent Caching (v0.8.5): Two-level caching system:
- L1 Memory Cache: Ultra-fast in-memory LRU cache (< 0.1ms).
- L2 Persistent Cache: Storage in CozoDB for restart resistance.
- Semantic Matching: Detects semantically similar queries via vector distance.
- Janitor TTL: Automatic cleanup of outdated cache entries by the Janitor service.
- Time-Travel: Changes are versioned via CozoDB
Validity; historical queries are possible. - JSON Merge Operator (++): Uses the v0.7 merge operator for efficient, atomic metadata updates.
- Multi-Statement Transactions (v1.2): Supports atomic transactions across multiple operations using CozoDB block syntax
{ ... }. Guarantees that related changes (e.g., create entity + add observation + link relationship) are executed fully or not at all. - Graph Metrics & Ranking Boost (v1.3 / v1.6): Integrates advanced graph algorithms:
- PageRank: Calculates the "importance" of knowledge nodes for ranking.
- Betweenness Centrality: Identifies central bridge elements in the knowledge network.
- HITS (Hubs & Authorities): Distinguishes between information sources (Authorities) and pointers (Hubs).
- Connected Components: Detects isolated knowledge islands and subgraphs.
- These metrics are automatically used in hybrid search (
advancedSearch) andgraphRag.
- Native CozoDB Operators (v1.5): Now uses explicit
:insert,:update, and:deleteoperators instead of generic:put(upsert) calls. Increases data safety through strict validation of database states (e.g., error when trying to "insert" an existing entity). - Advanced Time-Travel Analysis (v1.5): Extension of relationship history with time range filters (
since/until) and automatic diff summaries to analyze changes (additions/removals) over specific periods. - Graph Features (v1.6): Native integration of Shortest Path (Dijkstra) with path reconstruction, Community Detection (LabelPropagation), and advanced centrality measures.
- Graph Evolution: Tracks the temporal development of relationships (e.g., role change from "Manager" to "Consultant") via CozoDB
Validityqueries. - Bridge Discovery: Identifies "bridge entities" connecting different communities – ideal for creative brainstorming.
- Inference: Implicit suggestions and context extension (e.g., transitive expertise rule).
- Conflict Detection (Application-Level & Triggers): Automatically detects contradictions in metadata (e.g., "active" vs. "discontinued" /
archived: true). Uses robust logic in the app layer to ensure data integrity before writing. - Data Integrity (Trigger Concept): Prevents invalid states like self-references in relationships (Self-Loops) directly at creation.
- Hierarchical Summarization: The Janitor condenses old fragments into "Executive Summary" nodes to preserve the "Big Picture" long-term.
- User Preference Profiling: A specialized
global_user_profileentity stores persistent preferences (likes, work style), which receive a 50% score boost in every search. - Fact Lifecycle Management (v2.1): Uses CozoDB's native Validity retraction mechanism to manage the lifecycle of information. Instead of destructive deletions, facts are invalidated by asserting a
[timestamp, false]record. This ensures:- Auditability: You can always "time-travel" back to see what the system knew at any given point.
- Consistency: All standard retrieval (Search, Graph-RAG, Inference) uses the
@ "NOW"filter to automatically exclude retracted facts. - Atomic Retraction: Invalidation can be part of a multi-statement transaction, allowing for clean "update" patterns (invalidate old + insert new).
- All Local: Embeddings via Transformers/ONNX; no external embedding service required.
Positioning & Comparison
Most "Memory" MCP servers fall into two categories:
- Simple Knowledge Graphs: CRUD operations on triples, often only text search.
- Pure Vector Stores: Semantic search (RAG), but little understanding of complex relationships.
This server fills the gap in between ("Sweet Spot"): A local, database-backed memory engine combining vector, graph, and keyword signals.
Comparison with other solutions
| Feature | CozoDB Memory (This Project) | Official Reference (@modelcontextprotocol/server-memory) | mcp-memory-service (Community) | Database Adapters (Qdrant/Neo4j) |
|---|---|---|---|---|
| Backend | CozoDB (Graph + Vector + Relational) | JSON file (memory.jsonl) | SQLite / Cloudflare | Specialized DB (only Vector or Graph) |
| Search Logic | Agentic (Auto-Route): Hybrid + Graph + Summaries | Keyword only / Exact Graph Match | Vector + Keyword | Mostly only one dimension |
| Inference | Yes: Built-in engine for implicit knowledge | No | No ("Dreaming" is consolidation) | No (Retrieval only) |
| Community | Yes: Hierarchical Community Summaries | No | No | Only clustering (no summary) |
| Time-Travel | Yes: Queries at any point in time (Validity) | No (current state only) | History available, no native DB feature | No |
| Maintenance | Janitor: LLM-backed cleanup | Manual | Automatic consolidation | Mostly manual |
| Deployment | Local (Node.js + Embedded DB) | Local (Docker/NPX) | Local or Cloud | Often requires external DB server |
The core advantage is Intelligence and Traceability: By combining an Agentic Retrieval Layer with Hierarchical GraphRAG, the system can answer both specific factual questions and broad thematic queries with much higher accuracy than pure vector stores.
Performance & Benchmarks
Benchmarks on a standard developer laptop (Windows, Node.js 20+, CPU-only):
| Metric | Value | Note |
|---|---|---|
| Graph-Walking (Recursive) | ~130 ms | Vector Seed + Recursive Datalog Traversal |
| Graph-RAG (Breadth-First) | ~335 ms | Vector Seeds + 2-Hop Expansion |
| Hybrid Search (Cache Hit) | < 0.1 ms | v0.8+ Semantic Cache |
| Hybrid Search (Cold) | ~35 ms | FTS + HNSW + RRF Fusion |
| Vector Search (Raw) | ~51 ms | Pure semantic search as reference |
| FTS Search (Raw) | ~12 ms | Native Full-Text Search Performance |
| Ingestion | ~102 ms | Per Op (Write + Embedding + FTS/LSH Indexing) |
| RAM Usage | ~1.7 GB | Primarily due to local Xenova/bge-m3 model |
Running Benchmarks
You can test performance on your system with the integrated benchmark tool:
npm run benchmark
This tool (src/benchmark.ts) performs the following tests:
- Initialization: Cold start duration of the server incl. model loading.
- Ingestion: Mass import of test entities and observations (throughput).
- Search Performance: Latency measurement for Hybrid Search vs. Raw Vector Search.
- RRF Overhead: Determination of additional computation time for fusion logic.
Running Evaluation Suite (RAG Quality)
To evaluate the quality and recall of different retrieval strategies (Search vs. Graph-RAG vs. Graph-Walking), use the evaluation suite:
npm run eval
This tool compares strategies using a synthetic dataset and measures Recall@K, MRR, and Latency.
| Method | Recall@10 | Avg Latency | Best For |
|---|---|---|---|
| Graph-RAG | 1.00 | ~32 ms | Deep relational reasoning |
| Graph-RAG (Reranked) | 1.00 | ~36 ms | Maximum precision for relational data |
| Graph-Walking | 1.00 | ~50 ms | Associative path exploration |
| Hybrid Search | 1.00 | ~89 ms | Broad factual retrieval |
| Reranked Search | 1.00 | ~20 ms* | Ultra-precise factual search (Warm cache) |
Architecture
graph TB
Client[MCP Client<br/>Claude Desktop, etc.]
Server[MCP Server<br/>FastMCP + Zod Schemas]
Services[Memory Services]
Embeddings[Embeddings<br/>ONNX Runtime]
Search[Hybrid Search<br/>RRF Fusion]
Cache[Semantic Cache<br/>L1 + L2]
Inference[Inference Engine<br/>Multi-Strategy]
DB[(CozoDB SQLite<br/>Relations + Validity<br/>HNSW Indices<br/>Datalog/Graph)]
Client -->|stdio| Server
Server --> Services
Services --> Embeddings
Services --> Search
Services --> Cache
Services --> Inference
Services --> DB
style Client fill:#e1f5ff,color:#000
style Server fill:#fff4e1,color:#000
style Services fill:#f0e1ff,color:#000
style DB fill:#e1ffe1,color:#000
Graph-Walking Visualization
graph LR
Start([Query: What is Alice working on?])
V1[Vector Search<br/>Find: Alice]
E1[Alice<br/>Person]
E2[Project X<br/>Project]
E3[Feature Flags<br/>Technology]
E4[Bob<br/>Person]
Start --> V1
V1 -.semantic similarity.-> E1
E1 -->|works_on| E2
E2 -->|uses_tech| E3
E1 -->|colleague_of| E4
E4 -.semantic: also relevant.-> E2
style Start fill:#e1f5ff,color:#000
style V1 fill:#fff4e1,color:#000
style E1 fill:#ffe1e1,color:#000
style E2 fill:#e1ffe1,color:#000
style E3 fill:#f0e1ff,color:#000
style E4 fill:#ffe1e1,color:#000
Installation
Prerequisites
- Node.js 20+ (recommended)
- RAM: 1.7 GB minimum (for default bge-m3 model)
- Model download: ~600 MB
- Runtime memory: ~1.1 GB
- For lower-spec machines, see Embedding Model Options below
- CozoDB native dependency is installed via
cozo-node
Via npm (Easiest)
# Install globally
npm install -g cozo-memory
# Or use npx without installation
npx cozo-memory
From Source
git clone https://github.com/tobs-code/cozo-memory
cd cozo-memory
npm install
npm run build
Windows Quickstart
npm install
npm run build
npm run start
Notes:
- On first start,
@xenova/transformersdownloads the embedding model (may take time). - Embeddings are processed on the CPU.
Embedding Model Options
CozoDB Memory supports multiple embedding models via the EMBEDDING_MODEL environment variable:
| Model | Size | RAM | Dimensions | Best For |
|---|---|---|---|---|
Xenova/bge-m3 (default) | ~600 MB | ~1.7 GB | 1024 | High accuracy, production use |
Xenova/all-MiniLM-L6-v2 | ~80 MB | ~400 MB | 384 | Low-spec machines, development |
Xenova/bge-small-en-v1.5 | ~130 MB | ~600 MB | 384 | Balanced performance |
Configuration Options:
Option 1: Using .env file (Easiest for beginners)
# Copy the example file
cp .env.example .env
# Edit .env and set your preferred model
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
Option 2: MCP Server Config (For Claude Desktop / Kiro)
{
"mcpServers": {
"cozo-memory": {
"command": "npx",
"args": ["cozo-memory"],
"env": {
"EMBEDDING_MODEL": "Xenova/all-MiniLM-L6-v2"
}
}
}
}
Option 3: Command Line
# Use lightweight model for development
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run start
Download Model First (Recommended):
# Set model in .env or via command line, then:
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run download-model
}
**Note:** Changing models requires re-embedding existing data. The model is downloaded once on first use.
## Start / Integration
### MCP Server (stdio)
The MCP server runs over stdio (for Claude Desktop etc.). Start:
```bash
npm run start
Default database path: memory_db.cozo.db in project root (created automatically).
CLI Tool
CozoDB Memory includes a full-featured CLI for all operations:
# System operations
cozo-memory system health
cozo-memory system metrics
cozo-memory system reflect
# Entity operations
cozo-memory entity create -n "MyEntity" -t "person" -m '{"age": 30}'
cozo-memory entity get -i <entity-id>
cozo-memory entity delete -i <entity-id>
# Observations
cozo-memory observation add -i <entity-id> -t "Some note"
# Relations
cozo-memory relation create --from <id1> --to <id2> --type "knows" -s 0.8
# Search
cozo-memory search query -q "search term" -l 10
cozo-memory search context -q "context query"
cozo-memory search agentic -q "agentic query"
# Graph operations
cozo-memory graph explore -s <entity-id> -h 3
cozo-memory graph pagerank
cozo-memory graph communities
cozo-memory graph summarize
# Session & Task management
cozo-memory session start -n "My Session"
cozo-memory session stop -i <session-id>
cozo-memory task start -n "My Task" -s <session-id>
cozo-memory task stop -i <task-id>
# Export/Import
cozo-memory export json -o backup.json --include-metadata --include-relationships --include-observations
cozo-memory export markdown -o notes.md
cozo-memory export obsidian -o vault.zip
cozo-memory import file -i data.json -f cozo
# All commands support -f json or -f pretty for output formatting
TUI (Terminal User Interface)
Interactive TUI with mouse support powered by Python Textual:
# Install Python dependencies (one-time)
pip install textual
# Launch TUI
npm run tui
# or directly:
cozo-memory-tui
TUI Features:
- 🖱️ Full mouse support (click buttons, scroll, select inputs)
- ⌨️ Keyboard shortcuts (q=quit, h=help, r=refresh)
- 📊 Interactive menus for all operations
- 🎨 Rich terminal UI with colors and animations
- 📋 Real-time results display
- 🔍 Forms for entity creation, search, graph operations
- 📤 Export/Import wizards
Claude Desktop Integration
Using npx (Recommended)
{
"mcpServers": {
"cozo-memory": {
"command": "npx",
"args": ["cozo-memory"]
}
}
}
Using global installation
{
"mcpServers": {
"cozo-memory": {
"command": "cozo-memory"
}
}
}
Using local build
{
"mcpServers": {
"cozo-memory": {
"command": "node",
"args": ["C:/Path/to/cozo-memory/dist/index.js"]
}
}
}
Configuration & Backends
The system supports various storage backends. SQLite is used by default as it requires no extra installation and offers the best balance of performance and simplicity for most use cases.
Changing Backend (e.g., to RocksDB)
RocksDB offers advantages for very large datasets (millions of entries) and write-intensive workloads due to better parallelism and data compression.
To change the backend, set the DB_ENGINE environment variable before starting:
PowerShell:
$env:DB_ENGINE="rocksdb"; npm run dev
Bash:
DB_ENGINE=rocksdb npm run dev
| Backend | Status | Recommendation |
|---|---|---|
| SQLite | Active (Default) | Standard for desktop/local usage. |
| RocksDB | Prepared & Tested | For high-performance or very large datasets. |
| MDBX | Not supported | Requires manual build of cozo-node from source. |
Environment Variables
| Variable | Default | Description |
|---|---|---|
DB_ENGINE | sqlite | Database backend: sqlite or rocksdb |
EMBEDDING_MODEL | Xenova/bge-m3 | Embedding model (see Embedding Model Options) |
PORT | 3001 | HTTP API bridge port (if using npm run bridge) |
Data Model
CozoDB Relations (simplified) – all write operations create new Validity entries (Time-Travel):
entity:id,created_at: Validity⇒name,type,embedding(1024),name_embedding(1024),metadata(Json)observation:id,created_at: Validity⇒entity_id,text,embedding(1024),metadata(Json)relationship:from_id,to_id,relation_type,created_at: Validity⇒strength(0..1),metadata(Json)entity_community:entity_id⇒community_id(Key-Value Mapping from LabelPropagation)memory_snapshot:snapshot_id⇒ Counts +metadata+created_at(Int)
MCP Tools
The interface is reduced to 4 consolidated tools. The concrete operation is always chosen via action.
| Tool | Purpose | Key Actions |
|---|---|---|
mutate_memory | Write operations | create_entity, update_entity, delete_entity, add_observation, create_relation, start_session, stop_session, start_task, stop_task, run_transaction, add_inference_rule, ingest_file, invalidate_observation, invalidate_relation |
query_memory | Read operations | search, advancedSearch, context, entity_details, history, graph_rag, graph_walking, agentic_search (Multi-Level Context support) |
analyze_graph | Graph analysis | explore, communities, pagerank, betweenness, hits, shortest_path, bridge_discovery, semantic_walk, infer_relations |
manage_system | Maintenance | health, metrics, export_memory, import_memory, snapshot_create, snapshot_list, snapshot_diff, cleanup, reflect, summarize_communities, clear_memory |
mutate_memory (Write)
Actions:
create_entity:{ name, type, metadata? }update_entity:{ id, name?, type?, metadata? }delete_entity:{ entity_id }add_observation:{ entity_id?, entity_name?, entity_type?, text, metadata? }create_relation:{ from_id, to_id, relation_type, strength?, metadata? }start_session:{ name?, metadata? }(New v2.0): Starts a new session context (metadata can includeuser_id,project, etc.)stop_session:{ session_id }(New v2.0): Closes/archives an active session.start_task:{ name, session_id?, metadata? }(New v2.0): Starts a specific task within a session.stop_task:{ task_id }(New v2.0): Marks a task as completed.run_transaction:{ operations: Array<{ action, params }> }(New v1.2): Executes multiple operations atomically.add_inference_rule:{ name, datalog }ingest_file:{ format, file_path?, content?, entity_id?, entity_name?, entity_type?, chunking?, metadata?, observation_metadata?, deduplicate?, max_observations? }invalidate_observation:{ observation_id }(New v2.1): Retracts an observation using Validity[now, false].invalidate_relation:{ from_id, to_id, relation_type }(New v2.1): Retracts a relationship using Validity[now, false].formatoptions:"markdown","json","pdf"(New v1.9)file_path: Optional path to file on disk (alternative tocontentparameter)content: File content as string (required iffile_pathnot provided)chunkingoptions:"none","paragraphs"(future:"semantic")
Important Details:
run_transactionsupportscreate_entity,add_observation, andcreate_relation. Parameters are automatically suffixed to avoid collisions.create_relationrejects self-references (from_id === to_id).strengthis optional and defaults to1.0.add_observationadditionally providesinferred_suggestions(suggestions from the Inference Engine).add_observationperforms deduplication (exact + semantic via LSH). If duplicates are found, returnsstatus: "duplicate_detected"withexisting_observation_idand an estimatedsimilarity.update_entityuses the JSON Merge Operator++(v0.7) to merge existing metadata with new values instead of overwriting them.add_inference_rulevalidates Datalog code upon saving. Invalid syntax or missing required columns result in an error.
Examples:
{ "action": "create_entity", "name": "Alice", "type": "Person", "metadata": { "role": "Dev" } }
{ "action": "add_observation", "entity_id": "ENTITY_ID", "text": "Alice is working on the feature flag system." }
Example (Duplicate):
{ "action": "add_observation", "entity_id": "ENTITY_ID", "text": "Alice is working on the feature flag system." }
{ "status": "duplicate_detected", "existing_observation_id": "OBS_ID", "similarity": 1 }
{ "action": "create_relation", "from_id": "ALICE_ID", "to_id": "PROJ_ID", "relation_type": "works_on", "strength": 1.0 }
Custom Datalog Rules (Inference):
- Inference rules are stored as
action: "add_inference_rule". - The Datalog query must return a result set with exactly these 5 columns:
from_id, to_id, relation_type, confidence, reason. $idis the placeholder for the entity ID for which inference is started.
Example (Transitive Manager ⇒ Upper Manager):
{
"action": "add_inference_rule",
"name": "upper_manager",
"datalog": "?[from_id, to_id, relation_type, confidence, reason] :=\n *relationship{from_id: $id, to_id: mid, relation_type: \"manager_of\", @ \"NOW\"},\n *relationship{from_id: mid, to_id: target, relation_type: \"manager_of\", @ \"NOW\"},\n from_id = $id,\n to_id = target,\n relation_type = \"upper_manager_of\",\n confidence = 0.6,\n reason = \"Transitive manager path found\""
}
Bulk Ingestion (Markdown/JSON/PDF):
{
"action": "ingest_file",
"entity_name": "Project Documentation",
"format": "markdown",
"chunking": "paragraphs",
"content": "# Title\n\nSection 1...\n\nSection 2...",
"deduplicate": true,
"max_observations": 50
}
PDF Ingestion via File Path:
{
"action": "ingest_file",
"entity_name": "Research Paper",
"format": "pdf",
"file_path": "/path/to/document.pdf",
"chunking": "paragraphs",
"deduplicate": true
}
query_memory (Read)
Actions:
search:{ query, limit?, entity_types?, include_entities?, include_observations?, rerank? }advancedSearch:{ query, limit?, filters?, graphConstraints?, vectorOptions?, rerank? }context:{ query, context_window?, time_range_hours? }entity_details:{ entity_id, as_of? }history:{ entity_id }graph_rag:{ query, max_depth?, limit?, filters?, rerank? }Graph-based reasoning. Finds vector seeds (with inline filtering) first and then expands transitive relationships. Uses recursive Datalog for efficient BFS expansion.graph_walking:{ query, start_entity_id?, max_depth?, limit? }(v1.7) Recursive semantic graph search. Starts at vector seeds or a specific entity and follows relationships to other semantically relevant entities. Ideal for deeper path exploration.agentic_search:{ query, limit?, rerank? }(New v2.0): Auto-Routing Search. Uses a local LLM (Ollama) to analyze query intent and automatically routes it to the most appropriate strategy (vector_search,graph_walk, orcommunity_summary).get_relation_evolution:{ from_id, to_id?, since?, until? }(inanalyze_graph) Shows temporal development of relationships including time range filter and diff summary.
Important Details:
advancedSearchallows precise filtering:filters.entityTypes: List of entity types.filters.metadata: Key-Value Map for exact metadata matches.graphConstraints.requiredRelations: Only entities having certain relationships.graphConstraints.targetEntityIds: Only entities connected to these target IDs.
contextreturns a JSON object with entities, observations, graph connections, and inference suggestions.searchuses RRF (Reciprocal Rank Fusion) to mix vector and keyword signals.graph_ragcombines vector search with graph-based traversals (default depth 2) for "structured reasoning". Expansion is bidirectional across all relationship types.- User Profiling: Results linked to the
global_user_profileentity are automatically preferred (boosted). time_range_hoursfilters candidate results in the time window (in hours, can be float).as_ofaccepts ISO strings or"NOW"; invalid format results in an error.- If status contradictions are detected, optional
conflict_flagis attached to entities/observations;contextadditionally providesconflict_flagsas a summary.
Examples:
{ "action": "search", "query": "Feature Flag", "limit": 10 }
{
"action": "advancedSearch",
"query": "Manager",
"filters": { "metadata": { "role": "Lead" } },
"graphConstraints": { "requiredRelations": ["works_with"] }
}
{ "action": "graph_rag", "query": "What is Alice working on?", "max_depth": 2 }
{ "action": "context", "query": "What is Alice working on right now?", "context_window": 20 }
Conflict Detection (Status)
If there are contradictory statements about the status of an entity, a conflict is marked. The system considers temporal consistency:
- Status Contradiction: An entity has both "active" and "inactive" status in the same calendar year.
- Status Change (No Conflict): If statements are from different years (e.g., 2024 "discontinued", 2025 "active"), this is interpreted as a legitimate change and not marked as a conflict.
The detection uses regex matching on keywords like:
- Active: active, running, ongoing, in operation, continued, not discontinued.
- Inactive: discontinued, cancelled, stopped, shut down, closed, deprecated, archived, ended, abandoned.
Integration in API Responses:
entities[i].conflict_flagorobservations[i].conflict_flag: Flag directly on the match.conflict_flags: List of all detected conflicts incontextorsearchresult.
analyze_graph (Analysis)
Actions:
explore:{ start_entity, end_entity?, max_hops?, relation_types? }- with
end_entity: shortest path (BFS) - without
end_entity: Neighborhood up to max. 5 hops (aggregated by minimal hop count)
- with
communities:{}recalculates communities and writesentity_communitypagerank:{}Calculates PageRank scores for all entities.betweenness:{}Calculates Betweenness Centrality (centrality measure for bridge elements).hits:{}Calculates HITS scores (Hubs & Authorities).connected_components:{}Identifies isolated subgraphs.shortest_path:{ start_entity, end_entity }Calculates shortest path via Dijkstra (incl. distance and path reconstruction).bridge_discovery:{}Searches for entities acting as bridges between isolated communities (high Betweenness relevance).semantic_walk:{ start_entity, max_depth?, min_similarity? }(v1.7) Recursive semantic graph search. Starts at an entity and recursively follows paths consisting of explicit relationships AND semantic similarity (vector distance). Finds "associative paths" in the knowledge graph.hnsw_clusters:{}Analyzes clusters directly on the HNSW graph (Layer 0). Extremely fast as no vector calculations are needed.infer_relations:{ entity_id }provides suggestions from multiple strategies.get_relation_evolution:{ from_id, to_id?, since?, until? }shows temporal development of relationships including time range filter and diff summary.
Examples:
{ "action": "shortest_path", "start_entity": "ID_A", "end_entity": "ID_B" }
{ "action": "explore", "start_entity": "ENTITY_ID", "max_hops": 3 }
manage_system (Maintenance)
Actions:
health:{}returns DB counts + embedding cache stats + performance metrics.metrics:{}returns detailed operation counts, error statistics, and performance data.export_memory:{ format, includeMetadata?, includeRelationships?, includeObservations?, entityTypes?, since? }exports memory to various formats.import_memory:{ data, sourceFormat, mergeStrategy?, defaultEntityType? }imports memory from external sources.snapshot_create:{ metadata? }snapshot_list:{}snapshot_diff:{ snapshot_id_a, snapshot_id_b }cleanup:{ confirm, older_than_days?, max_observations?, min_entity_degree?, model? }summarize_communities:{ model?, min_community_size? }(New v2.0): Triggers the Hierarchical GraphRAG pipeline. Recomputes communities, generates thematic summaries via LLM, and stores them asCommunitySummaryentities.reflect:{ entity_id?, mode?, model? }Analyzes memory for contradictions and new insights. Supportssummary(default) anddiscovery(autonomous link refinement) modes.clear_memory:{ confirm }
Janitor Cleanup Details:
cleanupsupportsdry_run: withconfirm: falseonly candidates are listed.- With
confirm: true, the Janitor becomes active:- Hierarchical Summarization: Detects isolated or old observations, has them summarized by a local LLM (Ollama), and creates a new
ExecutiveSummarynode. Old fragments are deleted to reduce noise while preserving knowledge. - Automated Session Compression: Automatically identifies inactive sessions, summarizes their activity into a few bullet points, and stores the summary in the User Profile while marking the session as archived.
- Hierarchical Summarization: Detects isolated or old observations, has them summarized by a local LLM (Ollama), and creates a new
Before Janitor:
Entity: Project X
├─ Observation 1: "Started in Q1" (90 days old, isolated)
├─ Observation 2: "Uses React" (85 days old, isolated)
├─ Observation 3: "Team of 5" (80 days old, isolated)
└─ Observation 4: "Deployed to staging" (75 days old, isolated)
After Janitor:
Entity: Project X
└─ ExecutiveSummary: "Project X is a React-based application started in Q1
with a team of 5 developers, currently deployed to staging environment."
Self-Improving Memory Loop (reflect)
The reflect service analyzes observations of an entity (or top 5 active entities) to find contradictions, patterns, or temporal developments.
- Modes:
summary(default): Generates a textual "Reflective Insight" observation.discovery: Autonomously finds potential relationships using the Inference Engine and validates them via LLM.- High-confidence links (>0.8) are created automatically.
- Medium-confidence links (>0.5) are returned as suggestions.
- Results are persisted as new observations (for
summary) or relationships (fordiscovery) with metadata field{ "kind": "reflection" }or{ "source": "reflection" }. - Text is stored with prefix
Reflective Insight:.
Defaults: older_than_days=30, max_observations=20, min_entity_degree=2, model="demyagent-4b-i1:Q6_K".
Export/Import Details:
export_memorysupports three formats:- JSON (
format: "json"): Native Cozo format, fully re-importable with all metadata and timestamps. - Markdown (
format: "markdown"): Human-readable document with entities, observations, and relationships. - Obsidian (
format: "obsidian"): ZIP archive with Wiki-Links[[Entity]], YAML frontmatter, ready for Obsidian vault.
- JSON (
import_memorysupports four source formats:- Cozo (
sourceFormat: "cozo"): Import from native JSON export. - Mem0 (
sourceFormat: "mem0"): Import from Mem0 format (user_id becomes entity). - MemGPT (
sourceFormat: "memgpt"): Import from MemGPT archival/recall memory. - Markdown (
sourceFormat: "markdown"): Parse markdown sections as entities with observations.
- Cozo (
- Merge strategies:
skip(default, skip duplicates),overwrite(replace existing),merge(combine metadata). - Optional filters:
entityTypes(array),since(Unix timestamp in ms),includeMetadata,includeRelationships,includeObservations.
Example Export:
{
"action": "export_memory",
"format": "obsidian",
"includeMetadata": true,
"entityTypes": ["Person", "Project"]
}
Example Import:
{
"action": "import_memory",
"sourceFormat": "mem0",
"data": "{\"user_id\": \"alice\", \"memories\": [...]}",
"mergeStrategy": "skip"
}
Production Monitoring Details:
healthprovides comprehensive system status including entity/observation/relationship counts, embedding cache statistics, and performance metrics (last operation time, average operation time, total operations).metricsreturns detailed operational metrics:- Operation Counts: Tracks create_entity, update_entity, delete_entity, add_observation, create_relation, search, and graph_operations.
- Error Statistics: Total errors and breakdown by operation type.
- Performance Metrics: Last operation duration, average operation duration, and total operations executed.
- Delete operations now include detailed logging with verification steps and return statistics about deleted data (observations, outgoing/incoming relations).
Production Monitoring
The system includes comprehensive monitoring capabilities for production deployments:
Metrics Tracking
All operations are automatically tracked with detailed metrics:
- Operation counts by type (create, update, delete, search, etc.)
- Error tracking with breakdown by operation
- Performance metrics (latency, throughput)
Health Endpoint
The health action provides real-time system status:
{ "action": "health" }
Returns:
- Database counts (entities, observations, relationships)
- Embedding cache statistics (hit rate, size)
- Performance metrics (last operation time, average time, total operations)
Metrics Endpoint
The metrics action provides detailed operational metrics:
{ "action": "metrics" }
Returns:
- operations: Count of each operation type
- errors: Total errors and breakdown by operation
- performance: Last operation duration, average duration, total operations
Enhanced Delete Operations
Delete operations include comprehensive logging and verification:
- Detailed step-by-step logging with
[Delete]prefix - Counts related data before deletion
- Verification after deletion
- Returns statistics:
{ deleted: { observations: N, outgoing_relations: N, incoming_relations: N } }
Example:
{ "action": "delete_entity", "entity_id": "ENTITY_ID" }
Returns deletion statistics showing exactly what was removed.
Technical Highlights
Dual Timestamp Format (v1.9)
All write operations (create_entity, add_observation, create_relation) return timestamps in both formats:
created_at: Unix microseconds (CozoDB native format, precise for calculations)created_at_iso: ISO 8601 string (human-readable, e.g.,"2026-02-28T17:21:19.343Z")
This dual format provides maximum flexibility - use Unix timestamps for time calculations and comparisons, or ISO strings for display and logging.
Example response:
{
"id": "...",
"created_at": 1772299279343000,
"created_at_iso": "2026-02-28T17:21:19.343Z",
"status": "Entity created"
}
Local ONNX Embeddings (Transformers)
Default Model: Xenova/bge-m3 (1024 dimensions).
Embeddings are processed on the CPU to ensure maximum compatibility. They are kept in an LRU cache (1000 entries, 1h TTL). On embedding errors, a zero vector is returned to keep tool calls stable.
Hybrid Search (Vector + Keyword + Graph + Inference) + RRF
The search combines:
- Vector similarity via HNSW indices (
~entity:semantic,~observation:semantic) - Keyword matching via Regex (
regex_matches(...)) - Graph signal via PageRank (for central entities)
- Community Expansion: Entities from the community of top seeds are introduced with a boost
- Inference signal: probabilistic candidates (e.g.,
expert_in) withconfidenceas score
Fusion: Reciprocal Rank Fusion (RRF) across sources vector, keyword, graph, community, inference.
Temporal Decay (active by default):
- Before RRF fusion, scores are dampened based on time (
created_at). - Half-life: 90 days (exponential decay), with source-specific floors:
keyword: no decay (corresponds to "explicitly searched")graph/community: at least 0.6vector: at least 0.2
Uncertainty/Transparency:
- Inference candidates are marked as
source: "inference"and provide a short reason (uncertainty hint) in the result. - In
contextoutput, inferred entities additionally carry anuncertainty_hintso an LLM can distinguish "hard fact" vs. "conjecture".
Tiny Learned Reranker (Cross-Encoder)
For maximum precision, CozoDB Memory integrates a specialized Cross-Encoder Reranker (Phase 2 RAG).
- Model:
Xenova/ms-marco-MiniLM-L-6-v2(Local ONNX) - Mechanism: After initial hybrid retrieval, the top candidates (up to 30) are re-evaluated by the cross-encoder. Unlike bi-encoders (vectors), cross-encoders process query and document simultaneously, capturing deep semantic nuances.
- Latency: Minimal overhead (~4-6ms for top 10 candidates).
- Supported Tools: Available as a
rerank: trueparameter insearch,advancedSearch,graph_rag, andagentic_search.
Inference Engine
Inference uses multiple strategies (non-persisting):
- Co-occurrence: Entity names in observation texts (
related_to, confidence 0.7) - Semantic Proximity: Similar entities via HNSW (
similar_to, up to max. 0.9) - Transitivity: A→B and B→C (
potentially_related, confidence 0.5) - Expertise Rule:
Person+works_on+uses_tech⇒expert_in(confidence 0.7) - Query-Triggered Expertise: Search queries with keywords like
expert,skill,knowledge,competenceautomatically trigger a dedicated expert search over the graph network.
Optional: HTTP API Bridge
API Bridge
For Tools, there is an Express server embedding the MemoryServer directly.
Start:
npm run bridge
Configuration:
- Port via
PORT(Default:3001)
Selected Endpoints (Prefix /api):
GET /entities,POST /entities,GET /entities/:id,DELETE /entities/:idPOST /observationsGET /search,GET /contextGET /healthGET /snapshots,POST /snapshots
Development
Structure
src/index.ts: MCP Server + Tool Registration + Schema Setupsrc/embedding-service.ts: Embedding Pipeline + LRU Cachesrc/hybrid-search.ts: Search Strategies + RRF + Community Expansionsrc/inference-engine.ts: Inference Strategiessrc/api_bridge.ts: Express API Bridge (for UI)
Scripts (Root)
npm run build: TypeScript Buildnpm run dev: ts-node Start of MCP Servernpm run start: Startsdist/index.js(stdio)npm run bridge: Build + Start of API Bridge (dist/api_bridge.js)npm run benchmark: Runs performance tests
User Preference Profiling (Mem0-Style)
The system maintains a persistent profile of the user (preferences, dislikes, work style) via the specialized entity global_user_profile.
- Benefit: Personalization without manual search ("I know you prefer TypeScript").
- Mechanism: All observations assigned to this entity receive a significant boost in search and context queries.
- Initialization: The profile is automatically created on first start.
Manual Profile Editing
You can now directly edit the user profile using the edit_user_profile MCP tool:
// View current profile
{ }
// Update metadata
{
metadata: { timezone: "Europe/Berlin", language: "de" }
}
// Add preferences
{
observations: [
{ text: "Prefers TypeScript over JavaScript" },
{ text: "Likes concise documentation" }
]
}
// Clear and reset preferences
{
clear_observations: true,
observations: [{ text: "New preference" }]
}
// Update name and type
{
name: "Developer Profile",
type: "UserProfile"
}
Note: You can still use the implicit method via mutate_memory with action='add_observation' and entity_id='global_user_profile'.
Manual Tests
There are various test scripts for different features:
# Tests edge cases and basic operations
npx ts-node src/test-edge-cases.ts
# Tests hybrid search and context retrieval
npx ts-node src/test-context.ts
# Tests memory reflection (requires Ollama)
npx ts-node test-reflection.ts
# Tests user preference profiling and search boost
npx ts-node test-user-pref.ts
# Tests manual user profile editing
npx ts-node src/test-user-profile.ts
Troubleshooting
Common Issues
First Start Takes Long
- The embedding model download takes 30-90 seconds on first start (Transformers loads ~500MB of artifacts)
- This is normal and only happens once
- Subsequent starts are fast (< 2 seconds)
Cleanup/Reflect Requires Ollama
- If using
cleanuporreflectactions, an Ollama service must be running locally - Install Ollama from https://ollama.ai
- Pull the desired model:
ollama pull demyagent-4b-i1:Q6_K(or your preferred model)
Windows-Specific
- Embeddings are processed on CPU for maximum compatibility
- RocksDB backend requires Visual C++ Redistributable if using that option
Performance Issues
- First query after restart is slower (cold cache)
- Use
healthaction to check cache hit rates - Consider RocksDB backend for datasets > 100k entities
Roadmap
CozoDB Memory is actively developed. Here's what's planned:
Near-Term (v1.x)
- GPU Acceleration - CUDA support for embedding generation (10-50x faster)
- Streaming Ingestion - Real-time data ingestion from logs, APIs, webhooks
- Advanced Chunking - Semantic chunking for
ingest_file(paragraph-aware splitting) - Query Optimization - Automatic query plan optimization for complex graph traversals
- Additional Export Formats - Notion, Roam Research, Logseq compatibility
Mid-Term (v2.x)
- Multi-Modal Embeddings - Image and audio embedding support via CLIP/Whisper
- Distributed Mode - Multi-node deployment with CozoDB clustering
- Real-Time Sync - WebSocket-based live updates for collaborative use cases
- Advanced Inference - Causal reasoning, temporal pattern detection
- Web UI - Optional web interface for memory exploration and visualization
Long-Term
- Federated Learning - Privacy-preserving model updates across instances
- Custom Embedding Models - Fine-tune embeddings on domain-specific data
- Plugin System - Extensible architecture for custom tools and integrations
Community Requests
Have a feature idea? Open an issue with the enhancement label or check Low-Hanging-Fruit.md for quick wins you can contribute.
Contributing
Contributions are welcome! Please read CONTRIBUTING.md for guidelines on:
- Setting up the development environment
- Coding standards and best practices
- Testing and documentation requirements
- Pull request process
License
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.
Disclaimer
Single-User, Local-First: This project was developed to work for a single user and a local installation.