MCP Hub
Back to servers

rlm-claude

Recursive Language Models for Claude Code - Infinite memory solution inspired by MIT CSAIL paper

Stars
9
Forks
1
Updated
Feb 2, 2026
Validated
Feb 3, 2026

RLM - Infinite Memory for Claude Code

Your Claude Code sessions forget everything after /compact. RLM fixes that.

License: MIT Python 3.10+ MCP Server CI codecov PyPI version

Français | English | 日本語


The Problem

Claude Code has a context window limit. When it fills up:

  • /compact wipes your conversation history
  • Previous decisions, insights, and context are lost
  • You repeat yourself. Claude makes the same mistakes. Productivity drops.

The Solution

RLM is an MCP server that gives Claude Code persistent memory across sessions:

You: "Remember that the client prefers 500ml bottles"
     → Saved. Forever. Across all sessions.

You: "What did we decide about the API architecture?"
     → Claude searches its memory and finds the answer.

3 lines to install. 14 tools. Zero configuration.


Quick Install

Via PyPI (recommended)

pip install mcp-rlm-server[all]

Via Git

git clone https://github.com/EncrEor/rlm-claude.git
cd rlm-claude
./install.sh

Restart Claude Code. Done.

Requirements: Python 3.10+, Claude Code CLI

Upgrading from v0.9.0 or earlier

v0.9.1 moved the source code from mcp_server/ to src/mcp_server/ (PyPA best practice). A compatibility symlink is included so existing installations keep working, but we recommend re-running the installer:

cd rlm-claude
git pull
./install.sh          # reconfigures the MCP server path

Your data (~/.claude/rlm/) is untouched. Only the server path is updated.


How It Works

                    ┌─────────────────────────┐
                    │     Claude Code CLI      │
                    └────────────┬────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │    RLM MCP Server        │
                    │    (14 tools)            │
                    └────────────┬────────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              │                  │                   │
    ┌─────────▼────────┐ ┌──────▼──────┐ ┌──────────▼─────────┐
    │    Insights       │ │   Chunks    │ │    Retention        │
    │ (key decisions,   │ │ (full conv  │ │ (auto-archive,      │
    │  facts, prefs)    │ │  history)   │ │  restore, purge)    │
    └──────────────────┘ └─────────────┘ └────────────────────┘

Auto-Save Before Context Loss

RLM hooks into Claude Code's /compact event. Before your context is wiped, RLM automatically saves a snapshot. No action needed.

Two Memory Systems

SystemWhat it storesHow to use
InsightsKey decisions, facts, preferencesrlm_remember() / rlm_recall()
ChunksFull conversation segmentsrlm_chunk() / rlm_peek() / rlm_grep()

Features

Memory & Insights

  • rlm_remember - Save decisions, facts, preferences with categories and importance levels
  • rlm_recall - Search insights by keyword, category, or importance
  • rlm_forget - Remove an insight
  • rlm_status - System overview (insight count, chunk stats, access metrics)

Conversation History

  • rlm_chunk - Save conversation segments to persistent storage
  • rlm_peek - Read a chunk (full or partial by line range)
  • rlm_grep - Regex search across all chunks (+ fuzzy matching for typo tolerance)
  • rlm_search - Hybrid search: BM25 + semantic cosine similarity (FR/EN, accent-normalized)
  • rlm_list_chunks - List all chunks with metadata

Multi-Project Organization

  • rlm_sessions - Browse sessions by project or domain
  • rlm_domains - List available domains for categorization
  • Auto-detection of project from git or working directory
  • Cross-project filtering on all search tools

Smart Retention

  • rlm_retention_preview - Preview what would be archived (dry-run)
  • rlm_retention_run - Archive old unused chunks, purge ancient ones
  • rlm_restore - Bring back archived chunks
  • 3-zone lifecycle: ActiveArchive (.gz) → Purge
  • Immunity system: critical tags, frequent access, and keywords protect chunks

Auto-Chunking (Hooks)

  • PreCompact hook: Automatic snapshot before /compact or auto-compact
  • PostToolUse hook: Stats tracking after chunk operations
  • User-driven philosophy: you decide when to chunk, the system saves before loss

Semantic Search (optional)

  • Hybrid BM25 + cosine - Combines keyword matching with vector similarity for better relevance
  • Auto-embedding - New chunks are automatically embedded at creation time
  • Two providers - Model2Vec (fast, 256d) or FastEmbed (accurate, 384d)
  • Graceful degradation - Falls back to pure BM25 when semantic deps are not installed

Provider comparison (benchmark on 108 chunks)

Model2Vec (default)FastEmbed
Modelpotion-multilingual-128Mparaphrase-multilingual-MiniLM-L12-v2
Dimensions256384
Embed 108 chunks0.06s1.30s
Search latency0.1ms/query1.5ms/query
Memory0.1 MB0.3 MB
Disk (model)~35 MB~230 MB
Semantic qualityGood (keyword-biased)Better (true semantic)
Speed21x fasterBaseline

Top-5 result overlap between providers: ~1.6/5 (different results in 7/8 queries). FastEmbed captures more semantic meaning while Model2Vec leans toward keyword similarity. The hybrid BM25 + cosine fusion compensates for both weaknesses.

Recommendation: Start with Model2Vec (default). Switch to FastEmbed only if you need better semantic accuracy and can afford the slower startup.

# Model2Vec (default) — fast, ~35 MB
pip install mcp-rlm-server[semantic]

# FastEmbed — more accurate, ~230 MB, slower
pip install mcp-rlm-server[semantic-fastembed]
export RLM_EMBEDDING_PROVIDER=fastembed

# Compare both providers on your data
python3 scripts/benchmark_providers.py

# Backfill existing chunks (run once after install)
python3 scripts/backfill_embeddings.py

Sub-Agent Skills

  • /rlm-analyze - Analyze a single chunk with an isolated sub-agent
  • /rlm-parallel - Analyze multiple chunks in parallel (Map-Reduce pattern from MIT RLM paper)

Comparison

FeatureRaw ContextLetta/MemGPTRLM
Persistent memoryNoYesYes
Works with Claude CodeN/ANo (own runtime)Native MCP
Auto-save before compactNoN/AYes (hooks)
Search (regex + BM25 + semantic)NoBasicYes
Fuzzy search (typo-tolerant)NoNoYes
Multi-project supportNoNoYes
Smart retention (archive/purge)NoBasicYes
Sub-agent analysisNoNoYes
Zero config installN/AComplex3 lines
FR/EN supportN/AEN onlyBoth
CostFreeSelf-hostedFree

Usage Examples

Save and recall insights

# Save a key decision
rlm_remember("Backend is the source of truth for all data",
             category="decision", importance="high",
             tags="architecture,backend")

# Find it later
rlm_recall(query="source of truth")
rlm_recall(category="decision")

Manage conversation history

# Save important discussion
rlm_chunk("Discussion about API redesign... [long content]",
          summary="API v2 architecture decisions",
          tags="api,architecture")

# Search across all history
rlm_search("API architecture decisions")      # BM25 ranked
rlm_grep("authentication", fuzzy=True)         # Typo-tolerant

# Read a specific chunk
rlm_peek("2026-01-18_MyProject_001")

Multi-project organization

# Filter by project
rlm_search("deployment issues", project="MyApp")
rlm_grep("database", project="MyApp", domain="infra")

# Browse sessions
rlm_sessions(project="MyApp")

Project Structure

rlm-claude/
├── src/mcp_server/
│   ├── server.py              # MCP server (14 tools)
│   └── tools/
│       ├── memory.py          # Insights (remember/recall/forget)
│       ├── navigation.py      # Chunks (chunk/peek/grep/list)
│       ├── search.py          # BM25 search engine
│       ├── tokenizer_fr.py    # FR/EN tokenization
│       ├── sessions.py        # Multi-session management
│       ├── retention.py       # Archive/restore/purge lifecycle
│       ├── embeddings.py      # Embedding providers (Model2Vec, FastEmbed)
│       ├── vecstore.py        # Vector store (.npz) for semantic search
│       └── fileutil.py        # Safe I/O (atomic writes, path validation, locking)
│
├── hooks/                     # Claude Code hooks
│   ├── pre_compact_chunk.py   # Auto-save before /compact (PreCompact hook)
│   └── reset_chunk_counter.py # Stats reset after chunk (PostToolUse hook)
│
├── templates/
│   ├── hooks_settings.json    # Hook config template
│   ├── CLAUDE_RLM_SNIPPET.md  # CLAUDE.md instructions
│   └── skills/                # Sub-agent skills
│
├── context/                   # Storage (created at install, git-ignored)
│   ├── session_memory.json    # Insights
│   ├── index.json             # Chunk index
│   ├── chunks/                # Conversation history
│   ├── archive/               # Compressed archives (.gz)
│   ├── embeddings.npz         # Semantic vectors (Phase 8)
│   └── sessions.json          # Session index
│
├── install.sh                 # One-command installer
└── README.md

Configuration

Hook Configuration

The installer automatically configures hooks in ~/.claude/settings.json:

{
  "hooks": {
    "PreCompact": [
      {
        "matcher": "manual",
        "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/pre_compact_chunk.py" }]
      },
      {
        "matcher": "auto",
        "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/pre_compact_chunk.py" }]
      }
    ],
    "PostToolUse": [{
      "matcher": "mcp__rlm-server__rlm_chunk",
      "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/reset_chunk_counter.py" }]
    }]
  }
}

Custom Domains

Organize chunks by topic with custom domains:

{
  "domains": {
    "my_project": {
      "description": "Domains for my project",
      "list": ["feature", "bugfix", "infra", "docs"]
    }
  }
}

Edit context/domains.json after installation.


Manual Installation

If you prefer to install manually:

pip install -e ".[all]"
claude mcp add rlm-server -- python3 -m mcp_server
mkdir -p ~/.claude/rlm/hooks
cp hooks/*.py ~/.claude/rlm/hooks/
chmod +x ~/.claude/rlm/hooks/*.py
mkdir -p ~/.claude/skills/rlm-analyze ~/.claude/skills/rlm-parallel
cp templates/skills/rlm-analyze/skill.md ~/.claude/skills/rlm-analyze/
cp templates/skills/rlm-parallel/skill.md ~/.claude/skills/rlm-parallel/

Then configure hooks in ~/.claude/settings.json (see above).

Uninstall

./uninstall.sh              # Interactive (choose to keep or delete data)
./uninstall.sh --keep-data  # Remove RLM config, keep your chunks/insights
./uninstall.sh --all        # Remove everything
./uninstall.sh --dry-run    # Preview what would be removed

Security

RLM includes built-in protections for safe operation:

  • Path traversal prevention - Chunk IDs are validated against a strict allowlist ([a-zA-Z0-9_.-&]), and resolved paths are verified to stay within the storage directory
  • Atomic writes - All JSON and chunk files are written using write-to-temp-then-rename, preventing corruption from interrupted writes or crashes
  • File locking - Concurrent read-modify-write operations on shared indexes use fcntl.flock exclusive locks
  • Content size limits - Chunks are limited to 2 MB, and gzip decompression (archive restore) is capped at 10 MB to prevent resource exhaustion
  • SHA-256 hashing - Content deduplication uses SHA-256 (not MD5)

All I/O safety primitives are centralized in mcp_server/tools/fileutil.py.


Troubleshooting

"MCP server not found"

claude mcp list                    # Check servers
claude mcp remove rlm-server       # Remove if exists
claude mcp add rlm-server -- python3 -m mcp_server

"Hooks not working"

cat ~/.claude/settings.json | grep -A 10 "PreCompact"  # Verify hooks config
ls ~/.claude/rlm/hooks/                                  # Check installed hooks

Roadmap

  • Phase 1: Memory tools (remember/recall/forget/status)
  • Phase 2: Navigation tools (chunk/peek/grep/list)
  • Phase 3: Auto-chunking + sub-agent skills
  • Phase 4: Production (auto-summary, dedup, access tracking)
  • Phase 5: Advanced (BM25 search, fuzzy grep, multi-sessions, retention)
  • Phase 6: Production-ready (tests, CI/CD, PyPI)
  • Phase 7: MAGMA-inspired (temporal filtering, entity extraction)
  • Phase 8: Hybrid semantic search (BM25 + cosine, Model2Vec)

Inspired By

Research Papers

  • RLM Paper (MIT CSAIL) - Zhang et al., Dec 2025 - "Recursive Language Models" — foundational architecture (chunk/peek/grep, sub-agent analysis)
  • MAGMA (arXiv:2601.03236) - Jan 2026 - "Memory-Augmented Generation with Memory Agents" — temporal filtering, entity extraction (Phase 7)

Libraries & Tools

  • Model2Vec - Static word embeddings for fast semantic search (Phase 8)
  • BM25S - Fast BM25 implementation in pure Python (Phase 5)
  • FastEmbed - ONNX-based embeddings, optional provider (Phase 8)
  • Letta/MemGPT - AI agent memory framework — early inspiration

Standards & Platform


Authors

  • Ahmed MAKNI (@EncrEor)
  • Claude Opus 4.5 (joint R&D)

License

MIT License - see LICENSE

Reviews

No reviews yet

Sign in to write a review