Recall
Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.
Features
- Persistent Memory Storage: Store preferences, decisions, patterns, and session context
- Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors
- Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)
- Namespace Isolation: Global memories vs project-scoped memories
- Context Generation: Auto-format memories for session context injection
- Deduplication: Content-hash based duplicate detection
Installation
# Clone the repository
git clone https://github.com/yourorg/recall.git
cd recall
# Install with uv
uv sync
# Ensure Ollama is running with required models
ollama pull mxbai-embed-large # Required: embeddings for semantic search
ollama pull llama3.2 # Optional: session summarization for auto-capture hook
ollama serve
Usage
Run as MCP Server
uv run python -m recall
CLI Options
uv run python -m recall --help
Options:
--sqlite-path PATH SQLite database path (default: ~/.recall/recall.db)
--chroma-path PATH ChromaDB storage path (default: ~/.recall/chroma_db)
--collection NAME ChromaDB collection name (default: memories)
--ollama-host HOST Ollama server URL (default: http://localhost:11434)
--ollama-model MODEL Embedding model (default: mxbai-embed-large)
--ollama-timeout SECS Request timeout (default: 30)
--log-level LEVEL DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)
meta-mcp Configuration
Add Recall to your meta-mcp servers.json:
{
"recall": {
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/recall",
"python",
"-m",
"recall"
],
"env": {
"RECALL_LOG_LEVEL": "INFO",
"RECALL_OLLAMA_HOST": "http://localhost:11434",
"RECALL_OLLAMA_MODEL": "mxbai-embed-large"
},
"description": "Long-term memory system with semantic search",
"tags": ["memory", "context", "semantic-search"]
}
}
Or for Claude Code / other MCP clients (claude.json):
{
"mcpServers": {
"recall": {
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/recall",
"python",
"-m",
"recall"
],
"env": {
"RECALL_LOG_LEVEL": "INFO"
}
}
}
}
Environment Variables
| Variable | Default | Description |
|---|---|---|
RECALL_SQLITE_PATH | ~/.recall/recall.db | SQLite database file path |
RECALL_CHROMA_PATH | ~/.recall/chroma_db | ChromaDB persistent storage directory |
RECALL_COLLECTION_NAME | memories | ChromaDB collection name |
RECALL_OLLAMA_HOST | http://localhost:11434 | Ollama server URL |
RECALL_OLLAMA_MODEL | mxbai-embed-large | Embedding model name |
RECALL_OLLAMA_TIMEOUT | 30 | Ollama request timeout in seconds |
RECALL_LOG_LEVEL | INFO | Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
RECALL_DEFAULT_NAMESPACE | global | Default namespace for memories |
RECALL_DEFAULT_IMPORTANCE | 0.5 | Default importance score (0.0-1.0) |
RECALL_DEFAULT_TOKEN_BUDGET | 4000 | Default token budget for context |
MCP Tool Examples
memory_store_tool
Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.
{
"content": "User prefers dark mode in all applications",
"memory_type": "preference",
"namespace": "global",
"importance": 0.8,
"metadata": {"source": "explicit_request"}
}
Response (fast path via daemon):
{
"success": true,
"queued": true,
"queue_id": 42,
"namespace": "global"
}
Response (sync path fallback):
{
"success": true,
"queued": false,
"id": "550e8400-e29b-41d4-a716-446655440000",
"content_hash": "a1b2c3d4e5f67890"
}
daemon_status_tool
Check if the recall daemon is running:
{}
Response:
{
"running": true,
"status": {
"pid": 12345,
"store_queue": {"pending_count": 5},
"embed_worker_running": true
}
}
memory_recall_tool
Search memories by semantic similarity:
{
"query": "user interface preferences",
"n_results": 5,
"namespace": "global",
"memory_type": "preference",
"min_importance": 0.5,
"include_related": true
}
Response:
{
"success": true,
"memories": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"content": "User prefers dark mode in all applications",
"type": "preference",
"namespace": "global",
"importance": 0.8,
"created_at": "2024-01-15T10:30:00",
"accessed_at": "2024-01-15T14:22:00",
"access_count": 3
}
],
"total": 1,
"score": 0.92
}
memory_relate_tool
Create a relationship between memories:
{
"source_id": "mem_new_123",
"target_id": "mem_old_456",
"relation": "supersedes",
"weight": 1.0
}
Response:
{
"success": true,
"edge_id": 42
}
memory_context_tool
Generate formatted context for session injection:
{
"query": "coding style preferences",
"project": "myproject",
"token_budget": 4000
}
Response:
{
"success": true,
"context": "# Memory Context\n\n## Preferences\n\n- User prefers dark mode [global]\n- Use 2-space indentation [project:myproject]\n\n## Recent Decisions\n\n- Decided to use FastAPI for the backend [project:myproject]\n",
"token_estimate": 125
}
memory_forget_tool
Delete memories by ID or semantic search:
{
"memory_id": "550e8400-e29b-41d4-a716-446655440000",
"confirm": true
}
Or delete by search:
{
"query": "outdated preferences",
"namespace": "project:oldproject",
"n_results": 10,
"confirm": true
}
Response:
{
"success": true,
"deleted_ids": ["550e8400-e29b-41d4-a716-446655440000"],
"deleted_count": 1
}
Architecture
┌─────────────────────────────────────────────────────────────┐
│ MCP Server (FastMCP) │
│ memory_store │ memory_recall │ memory_relate │ memory_forget │
└───────────────────────────┬─────────────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
┌─────────▼─────────┐ ┌─────────▼─────────┐
│ FAST PATH │ │ SYNC PATH │
│ <10ms │ │ 10-60s │
└─────────┬─────────┘ └─────────┬─────────┘
│ │
┌─────────▼─────────┐ ┌─────────▼─────────┐
│ recall-daemon │ │ HybridStore │
│ (Unix socket) │ │ (Direct Ollama) │
│ │ └─────────┬─────────┘
│ ┌─────────────┐ │ │
│ │ StoreQueue │ │ ┌───────────┼───────────┐
│ │ EmbedWorker │ │ │ │ │
│ └─────────────┘ │ │ │ │
└─────────┬─────────┘ ┌─▼─────┐ ┌───▼───┐ ┌─────▼─────┐
│ │SQLite │ │Chroma │ │ Ollama │
└─────────────►Store │ │ Store │ │ Client │
└───────┘ └───────┘ └───────────┘
The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding (10-60s).
Daemon Setup (macOS)
The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.
Quick Install
# From the recall directory
./hooks/install-daemon.sh
This will:
- Copy hook scripts to
~/.claude/hooks/ - Install the launchd plist to
~/Library/LaunchAgents/ - Start the daemon automatically
Manual Install
# 1. Copy hook scripts
cp hooks/recall*.py ~/.claude/hooks/
chmod +x ~/.claude/hooks/recall*.py
# 2. Create logs directory
mkdir -p ~/.claude/hooks/logs
# 3. Install plist with path substitution
sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \
hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist
# 4. Load the daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist
Daemon Commands
# Check status
echo '{"cmd": "status"}' | nc -U /tmp/recall-daemon.sock | jq
# Stop daemon
launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist
# Start daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist
# View logs
tail -f ~/.claude/hooks/logs/recall-daemon.log
Hooks Configuration
Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.
Development
# Install dev dependencies
uv sync --dev
# Run tests
uv run pytest tests/
# Run tests with coverage
uv run pytest tests/ --cov=recall --cov-report=html
# Type checking
uv run mypy src/recall
# Run specific integration tests
uv run pytest tests/integration/test_mcp_server.py -v
Requirements
- Python 3.13+
- Ollama with:
mxbai-embed-largemodel (required for semantic search)llama3.2model (optional, for session auto-capture hook)
- ~500MB disk space for ChromaDB indices
License
MIT