MCP Hub
Back to servers

melchizedek

Persistent memory for Claude Code — offline, single-file, hybrid search.

Updated
Mar 1, 2026

Quick Install

npx -y melchizedek

Melchizedek

npm version npm downloads CI License: MIT

Persistent memory for Claude Code. Automatically indexes every conversation and provides production-grade hybrid search (BM25 + vectors + reranker) via MCP tools. 100% local, zero config, zero API keys, zero invoice.


Why Melchizedek?

Claude Code forgets everything between sessions - and knows nothing about your other projects. Melchizedek fixes both.

It runs silently in the background - indexing your conversations as you work - then gives Claude the ability to search across your entire history, across all projects: past debugging sessions, architectural decisions, error solutions, code patterns.

No cloud. No API keys. No config. Plug and ask.

How it works

~/.claude/projects/**/*.jsonl       (your conversation transcripts - read-only)
        |
        v
  SessionEnd hook                   (auto-triggers after each session)
        |
        v
  +-----------------+
  |  Indexer         |    Parse JSONL -> chunk pairs -> SHA-256 dedup
  |  (better-sqlite3)|    FTS5 tokenize -> vector embed (optional)
  +-----------------+
        |
        v
  ~/.melchizedek/memory.db           (single SQLite file, WAL mode)
        |
        v
  +-----------------+
  |  MCP Server      |    16 search & management tools
  |  (stdio)         |    Hybrid: BM25 + vectors + RRF + reranker
  +-----------------+
        |
        v
  Claude Code                       (searches your history via MCP)

Daemon singleton - multi-instance support

When you open multiple Claude Code windows, Melchizedek shares a single daemon process across all of them - 1 database, 1 embedder, 1 reranker loaded once in memory.

The server starts in 3 phases:

  1. Try connecting to an existing daemon (Unix socket on macOS/Linux, named pipe on Windows)
  2. Auto-start the daemon if none is running
  3. Fallback to local standalone mode if the daemon can't start

This is transparent - Claude Code sees a normal stdio MCP server. Set M9K_NO_DAEMON=1 or --no-daemon to disable daemon mode.

Search pipeline - 4 levels of graceful degradation

Every layer is optional. The plugin works with BM25 alone and gets better as more components are available.

LevelComponentWhat it addsDependency
1BM25 (FTS5)Keyword search with stemmingNone (always active)
2Dual vectors (sqlite-vec)Semantic search - text (MiniLM 384d) + code (Jina 768d)@huggingface/transformers (optional)
3RRF fusionMerges BM25 + text vectors + code vectors via Reciprocal Rank FusionVectors enabled
4RerankerCross-encoder re-scoring of top resultsTransformers.js or node-llama-cpp (optional)

Performance

Measured with npm run bench - 100 sessions, 1 000 chunks, on a single SQLite file.

MetricResultTarget
Indexation (100 sessions)~80 ms< 10 s
BM25 search (mean)~0.2 ms< 50 ms
DB size (100 sessions)~1.4 MB< 30 MB
Tokens per search~125< 2 000

Installation

Claude Code plugin marketplace

claude plugin install melchizedek

npm (global)

npm install -g melchizedek

Create a file (e.g. /tmp/melchizedek-mcp.json):

{
  "mcpServers": {
    "melchizedek": {
      "command": "melchizedek-server"
    }
  }
}
claude --mcp-config /tmp/melchizedek-mcp.json

npx (no install)

Create a file (e.g. /tmp/melchizedek-mcp.json):

{
  "mcpServers": {
    "melchizedek": {
      "command": "npx",
      "args": ["melchizedek-server"]
    }
  }
}
claude --mcp-config /tmp/melchizedek-mcp.json

From source (contributors)

git clone https://github.com/louis49/melchizedek.git
cd melchizedek
npm install && npm run build

Then launch Claude Code with the generated .mcp.json:

claude --mcp-config .mcp.json

Note: npm run build generates .mcp.json with absolute paths to dist/server.js. The claude mcp add command may not work reliably due to known Claude Code plugin bugs - --mcp-config is the tested method.

Setting up hooks (automatic indexing)

The MCP server provides search tools, but hooks are what trigger automatic indexing. Without hooks, you'd need to manually index sessions.

For marketplace installs, hooks are configured automatically. For npm/npx/source installs, add the following to ~/.claude/settings.json:

{
  "hooks": {
    "SessionEnd": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "node /absolute/path/to/dist/hooks/session-end.js"
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "node /absolute/path/to/dist/hooks/session-end.js"
          }
        ]
      }
    ],
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "node /absolute/path/to/dist/hooks/session-start.js"
          }
        ]
      }
    ],
    "PreCompact": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "node /absolute/path/to/dist/hooks/pre-compact.js"
          }
        ]
      }
    ]
  }
}

Replace /absolute/path/to with the actual path to your Melchizedek installation (e.g. $(npm root -g)/melchizedek for global installs, or your clone directory for source installs).

HookWhat it does
SessionEnd / StopIndexes the conversation transcript after each session
SessionStartInjects recent context from past sessions into the new session
PreCompactIndexes conversation chunks not yet indexed before /compact truncates the transcript

After installation, restart Claude Code. That's it - indexing starts automatically.

Enhanced Search (Optional)

Melchizedek works out of the box with BM25 keyword search. Text embeddings (MiniLM) download automatically on first use for semantic search. The optional backends below add GPU-accelerated code embeddings and reranking for maximum search quality.

Recommended Setup by Platform

macOS (Apple Silicon)

ComponentBackendModelGPUNotes
Text embeddingtransformers-js (default)Multilingual-MiniLM-L12-v2 (384d)CPUZero config, ~100 chunks/s
Code embeddingollamaunclemusclez/jina-embeddings-v2-base-code (768d)MetalSetup Ollama
Rerankerllama-serverBGE Reranker v2 M3MetalSetup llama-server

ONNX Runtime has no Metal backend for Node.js - transformers-js runs CPU only on macOS. MiniLM is small enough that this isn't a bottleneck. For code embeddings, Ollama provides GPU acceleration via Metal.

Linux (NVIDIA)

ComponentBackendModelGPUNotes
Text embeddingtransformers-jsMultilingual-MiniLM-L12-v2 (384d)CUDAInstall onnxruntime-node-gpu for GPU
Code embeddingollamaunclemusclez/jina-embeddings-v2-base-code (768d)CUDASetup Ollama
Rerankerllama-serverBGE Reranker v2 M3CUDASetup llama-server

To enable CUDA for text embeddings: npm install onnxruntime-node-gpu (replaces the CPU-only onnxruntime-node, no code changes needed). Requires NVIDIA drivers + CUDA Toolkit 12.4+.

Windows (NVIDIA)

ComponentBackendModelGPUNotes
Text embeddingtransformers-js (default)Multilingual-MiniLM-L12-v2 (384d)CPUGPU via onnxruntime-node-gpu or DirectML
Code embeddingollamaunclemusclez/jina-embeddings-v2-base-code (768d)CUDASetup Ollama
Rerankernode-llama-cppBGE Reranker v2 M3CUDASetup node-llama-cpp (prebuilt)

Ollama auto-detects NVIDIA GPUs after installation. For reranking, node-llama-cpp has prebuilt CUDA binaries - no compilation needed. llama-server is also an option but requires Visual Studio Build Tools to compile.

CPU-only (any platform)

ComponentBackendModelSpeedNotes
Text embeddingtransformers-js (default)Multilingual-MiniLM-L12-v2 (384d)~100 chunks/sZero config
Code embeddingtransformers-js (default)jina-embeddings-v2-base-code (768d)~0.5 chunk/sSlow - consider disabling
Rerankertransformers-js (default)ms-marco-MiniLM-L-6-v2~200ms/queryZero config

Everything works on CPU - BM25 search is unaffected (no GPU needed). Code embedding is slow without GPU; disable it with "embeddingCodeEnabled": false if speed is a concern.

Recommended models reference

RoleBackendModel IDSizeNotes
Text embeddingtransformers-jsXenova/paraphrase-multilingual-MiniLM-L12-v2~120 MB (int8)Multilingual, auto-downloaded, zero config
Text embeddingollamanomic-embed-text~275 MBEnglish-centric - fallback if Transformers.js unavailable
Code embeddingtransformers-jsjinaai/jina-embeddings-v2-base-code~160 MB (int8)Auto-downloaded, slow on CPU
Code embeddingollamaunclemusclez/jina-embeddings-v2-base-code~323 MBollama pull, GPU-accelerated, recommended for code
Rerankertransformers-jsXenova/ms-marco-MiniLM-L-6-v2~23 MB (int8)English-only, CPU ~200ms, zero config fallback
Rerankerllama-serverbge-reranker-v2-m3-Q4_K_M.gguf~440 MBMultilingual, GPU ~50ms, recommended
Rerankerllama-serverqwen3-reranker-0.6b-q8_0.gguf~640 MBMultilingual, higher quantization (Q8 vs Q4)
Rerankernode-llama-cppbge-reranker-v2-m3-Q4_K_M.gguf~440 MBPlace in ~/.melchizedek/models/
Rerankernode-llama-cppqwen3-reranker-0.6b-q8_0.gguf~640 MBPlace in ~/.melchizedek/models/

All Transformers.js models auto-download from Hugging Face on first use. GGUF models must be downloaded manually.

Language note: The default text embedder (MiniLM) is multilingual - it works well for non-English conversations. The default CPU reranker (ms-marco) is English-only - for other languages, use a GGUF reranker (BGE m3 or Qwen3, both multilingual). BM25 keyword search works for any language via FTS5 Unicode tokenization.

Tested embedding models

You can switch embedding models via m9k_config key="embeddingTextModel" value='"model-key"'. All models below have been tested end-to-end (load, embed, normalize, dimension check). Any ONNX-compatible HuggingFace model not listed here can also be used - Melchizedek will auto-detect dimensions and pooling from the model cache.

Transformers.js (local ONNX, zero config)

KeyHuggingFace IDDimsPoolingContextLangNotes
minilm-l12-v2Xenova/paraphrase-multilingual-MiniLM-L12-v2384mean512 tokMultiDefault text. Best balance speed/quality for conversations
minilm-l6-v2Xenova/all-MiniLM-L6-v2384mean256 tokENFastest, lightest (~1 MB q8)
multilingual-e5-smallXenova/multilingual-e5-small384mean512 tokMultiGood multilingual, queryPrefix "query: "
bge-small-en-v1.5Xenova/bge-small-en-v1.5384cls512 tokENHigh MTEB scores for size
bge-base-en-v1.5Xenova/bge-base-en-v1.5768cls512 tokENStrong English baseline
bge-m3Xenova/bge-m31024cls8K tokMultiLarge context, multilingual powerhouse
nomic-embed-text-v1.5nomic-ai/nomic-embed-text-v1.5768mean8K tokENLong context, open-source leader
mxbai-embed-xsmall-v1mixedbread-ai/mxbai-embed-xsmall-v1384cls4K tokENTiny + long context
mxbai-embed-large-v1mixedbread-ai/mxbai-embed-large-v11024cls512 tokENTop MTEB scores
snowflake-arctic-embed-m-v2Snowflake/snowflake-arctic-embed-m-v2.0768cls8K tokMultiSnowflake's multilingual, queryPrefix "query: "
snowflake-arctic-embed-l-v2Snowflake/snowflake-arctic-embed-l-v2.01024cls8K tokMultiSnowflake's large variant
gte-smallXenova/gte-small384mean512 tokENLightweight alternative
gte-multilingual-baseonnx-community/gte-multilingual-base768cls8K tokMultiAlibaba's multilingual
jina-code-v2jinaai/jina-embeddings-v2-base-code768mean8K tokCodeDefault code. Code-specialized
jina-v2-small-enXenova/jina-embeddings-v2-small-en512mean8K tokENLighter Jina variant
qwen3-embedding-0.6bonnx-community/Qwen3-Embedding-0.6B-ONNX1024last_token8K tokMultiInstruction-tuned, highest quality, slowest (~9s first embed)

Custom models: Set embeddingTextModel to any HuggingFace model ID (e.g. "org/my-model"). Melchizedek resolves in order: built-in registry, HF cache metadata (config.json), then dynamic fallback (mean pooling, dimensions probed at runtime).

Ollama (GPU-accelerated, any model)

Any Ollama embedding model works - no registry needed. Dimensions are auto-detected. Tested models:

ModelDimsTypeDiscriminationPull commandNotes
nomic-embed-text768text0.31ollama pull nomic-embed-textMost popular, good default
unclemusclez/jina-embeddings-v2-base-code768code0.61ollama pull unclemusclez/jina-embeddings-v2-base-codeRecommended for code
qwen3-embedding:0.6b1024text0.42ollama pull qwen3-embedding:0.6bBest quality, ~9s cold start

Other popular choices (untested but expected to work):

ModelDimsPull commandNotes
mxbai-embed-large1024ollama pull mxbai-embed-largeTop MTEB scores
snowflake-arctic-embedvariesollama pull snowflake-arctic-embed:xsxs/s/m/l variants
all-minilm384ollama pull all-minilmLightest
bge-m31024ollama pull bge-m3Multilingual powerhouse

Browse all: ollama.com/search?c=embedding

Reranker models

BackendModelSizeGPUNotes
transformers-jsXenova/ms-marco-MiniLM-L-6-v2~23 MBCPUDefault. English-only, ~200ms/query, zero config
llama-serverbge-reranker-v2-m3-Q4_K_M.gguf~440 MBMetal/CUDARecommended. Multilingual, GPU ~50ms
llama-serverqwen3-reranker-0.6b-q8_0.gguf~640 MBMetal/CUDAMultilingual, higher quality
node-llama-cppbge-reranker-v2-m3-Q4_K_M.gguf~440 MBMetal/CUDAPlace in ~/.melchizedek/models/, auto-detected
node-llama-cppqwen3-reranker-0.6b-q8_0.gguf~640 MBMetal/CUDAPlace in ~/.melchizedek/models/, auto-detected

Setting up Ollama

Ollama provides GPU-accelerated code embeddings on all platforms.

# macOS  - download the .dmg from https://ollama.com/download/mac
# Windows - download installer from https://ollama.com/download
# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Then pull the code embedding model
ollama pull unclemusclez/jina-embeddings-v2-base-code

Then tell Melchizedek to use Ollama for code embeddings:

m9k_config key="embeddingCodeBackend" value='"ollama"'
m9k_config key="embeddingCodeModel" value='"unclemusclez/jina-embeddings-v2-base-code"'

Setting up a GPU reranker

The reranker is a cross-encoder that re-scores results after BM25 + vector fusion. It's optional - search works without it - but it improves precision on ambiguous queries. The default (transformers-js, CPU) works out of the box. For GPU acceleration:

Option A - llama-server (recommended)

# 1. Compile llama.cpp
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_METAL=ON    # macOS Metal
# cmake -B build -DGGML_CUDA=ON   # Linux/Windows CUDA
cmake --build build --config Release -j

# 2. Download a GGUF reranker model (pick one)
# BGE Reranker v2 M3 (~440 MB) - recommended
wget https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF/resolve/main/bge-reranker-v2-m3-Q4_K_M.gguf
# Or: Qwen3 Reranker 0.6B (~640 MB) - alternative
# wget https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf

# 3. Run the server
./build/bin/llama-server --rerank --pooling rank \
  -m bge-reranker-v2-m3-Q4_K_M.gguf --port 8012

Then configure Melchizedek - either edit ~/.melchizedek/config.json or ask Claude:

m9k_config key="rerankerBackend" value='"llama-server"'
m9k_config key="rerankerUrl" value='"http://localhost:8012"'

Verify: curl http://localhost:8012/health should return {"status":"ok"}. Hot-reload works - no need to restart Melchizedek.

Option B - node-llama-cpp

npm install -g node-llama-cpp
mkdir -p ~/.melchizedek/models
cp bge-reranker-v2-m3-Q4_K_M.gguf ~/.melchizedek/models/

No config needed - Melchizedek auto-detects GGUF files matching bge-reranker* or qwen*reranker* in ~/.melchizedek/models/.

Backend detection priority

Reranker: llama-server (if URL set + healthy) > node-llama-cpp (if GGUF found) > transformers-js (CPU) > none.

Check active backends: m9k_info shows the current pipeline in its output.

Alternative configurations

ScenarioConfig
No Ollama, skip code embedding"embeddingCodeEnabled": false
Ollama for everythingBoth backends = "ollama" (text: nomic-embed-text, code: unclemusclez/jina-embeddings-v2-base-code)
Offline only (CPU)Default - transformers-js for both (no network)
Disable reranker"rerankerEnabled": false
Disable all embeddings"embeddingsEnabled": false (BM25 only)

MCP Tools

Search (start here)

ToolDescription
m9k_searchSearch indexed conversations. Returns compact snippets. Current project boosted. Supports since/until date filters and order (score, date_asc, date_desc).
m9k_contextGet a chunk with surrounding context (adjacent chunks in the same session).
m9k_fullRetrieve full content of chunks by IDs.

Progressive retrieval pattern - search returns ~50 tokens/result, context ~200-300, full ~500-1000. Start with m9k_search, drill down only when needed. 4x token savings vs loading everything.

Context-aware ranking - results from your current project (×1.5) and current session (×1.2) are automatically promoted. Cross-project results remain visible.

Specialized search

ToolDescription
m9k_file_historyFind past conversations that touched a specific file.
m9k_errorsFind past solutions for an error message.
m9k_similar_workFind past approaches to similar tasks. Prioritizes rich metadata.

Memory management

ToolDescription
m9k_saveManually save a memory note for future recall.
m9k_sessionsList all indexed sessions, optionally filtered by project.
m9k_infoShow memory index info: corpus size, search pipeline, embedding worker, usage metrics.
m9k_configView or update plugin configuration.
m9k_forgetPermanently remove a chunk from the index.
m9k_delete_sessionDelete a session from the index.
m9k_ignore_projectExclude a project from indexing. Future sessions won't be indexed, existing ones optionally purged.
m9k_unignore_projectRe-enable indexing for a previously ignored project. Purged data is not restored.
m9k_restartRestart the MCP server to load fresh code after npm run build. Supports force: true for stuck processes.

Usage guide

ToolDescription
__USAGE_GUIDEPhantom tool. Its description teaches Claude the retrieval pattern and available tools.

Configuration

Zero config by default. Everything is tunable via m9k_config or environment variables.

SettingDefaultEnv var
Database path~/.melchizedek/memory.dbM9K_DB_PATH
JSONL directory~/.claude/projectsM9K_JSONL_DIR
Daemon modeenabledM9K_NO_DAEMON=1 to disable (or --no-daemon)
Log levelwarnM9K_LOG_LEVEL
Embeddings enabledtrueM9K_EMBEDDINGS=false to disable
Text embedding backendauto (Transformers.js, then Ollama)M9K_EMBEDDING_TEXT_BACKEND
Text embedding modelMultilingual-MiniLM-L12-v2 (384d)M9K_EMBEDDING_TEXT_MODEL
Code embedding backendauto (Jina Code, then Ollama)M9K_EMBEDDING_CODE_BACKEND
Code embedding modeljina-embeddings-v2-base-code (768d)M9K_EMBEDDING_CODE_MODEL
Code embedding enabledtrueM9K_EMBEDDING_CODE=false to disable
Ollama base URLhttp://localhost:11434M9K_OLLAMA_BASE_URL
Reranker enabledtrueM9K_RERANKER=false to disable
Reranker backendauto (llama-server > node-llama-cpp > Transformers.js)M9K_RERANKER_BACKEND
Reranker model- (auto-detect)M9K_RERANKER_MODEL
Reranker URL-M9K_RERANKER_URL
Reranker top N10M9K_RERANKER_TOP_N
Models directory~/.melchizedek/modelsM9K_MODELS_DIR
Max chunk tokens1000-
Auto-fuzzy threshold3 (retry with wildcards if < 3 results)-
Sync purgefalseM9K_SYNC_PURGE=true

How is this different?

Melchizedekclaude-historian-mcpclaude-memepisodic-memorymcp-memory-service
GitHub stars npmGitHub stars npmGitHub stars npmGitHub starsGitHub stars PyPI
PhilosophySearch engine - indexes everything, you searchSearch engine - scans JSONL on demandNotebook - AI compresses & savesSearch engineNotebook - AI decides what to store
Indexes raw conversationsYes (JSONL transcripts)Yes (direct JSONL read, no persistent index)Compressed summariesYes (JSONL)No (manual store_memory)
Retroactive on installYes (backfills all history)Yes (reads existing files)NoYesNo (empty at start)
SearchBM25 + vectors + RRF + rerankerTF-IDF + fuzzy matchingFTS5 + ChromaDBVectors onlyBM25 + vectors
Progressive retrieval3 layers (search/context/full)NoNoNoNo
100% offlineYesYesNo (needs API for compression)YesYes
Single-file storageSQLiteNone (reads raw JSONL)SQLite + ChromaDBSQLiteSQLite-vec
Zero configYesYesYesYesYes
MCP tools16104212
LicenseMITMITAGPL-3.0MITApache-2.0
Dual embedding (text + code)Yes (MiniLM + Jina Code)NoNoNoNo
Configurable modelsYes (Transformers.js or Ollama)NoNo (Chroma internal)No (hardcoded)Yes (ONNX, Ollama, OpenAI, Cloudflare)
RerankerCross-encoder (ONNX, GGUF, or HTTP)NoNoNoQuality scorer (not search reranker)
PrivacyAll local, <private> tag redactionAll localSends data to Anthropic APIAll localAll local
Multi-instanceSingleton daemon - N Claude windows share 1 process (Unix socket / Windows named pipe, local fallback)N separate processesShared HTTP worker (:37777)N separate processesShared HTTP server

Inspirations

This project stands on the shoulders of others. Key ideas borrowed from:

ProjectWhat we took
CASSRRF hybrid fusion, SHA-256 dedup, auto-fuzzy fallbackGitHub stars
claude-historian-mcpSpecialized MCP tools (file_history, error_solutions)GitHub stars npm
claude-diaryPreCompact hook (archive before /compact)GitHub stars

Memory usage

Melchizedek loads ML models for embeddings and reranking. Here's what to expect:

ComponentRSS (real)When
Server (BM25 only)~70 MBAlways
+ Text embedder (Multilingual-MiniLM q8)~450 MBAt startup
+ Reranker (ms-marco q8)~250 MBOn first search
Embed-worker (text)~450 MBDuring backfill, then exits
Embed-worker (code, Jina q8)~2.5 GBDuring backfill, then exits

The embed-worker is a child process that runs during initial indexing and exits when done - its memory is fully reclaimed.

About virtual memory (VSZ): macOS Activity Monitor may show very large virtual memory numbers (400+ GB per process). This is normal - ONNX Runtime reserves large virtual address ranges via mmap without actually using physical RAM. The real consumption is the RSS column above. Only RSS reflects actual memory pressure.

To reduce memory usage:

  • "embeddingCodeEnabled": false - skip code embeddings (saves ~2.5 GB during backfill)
  • "embeddingsEnabled": false - BM25 only, ~70 MB total
  • Use Ollama for code embeddings - offloads to a separate process with GPU acceleration

Known issues

  • Session boost inactive - Claude Code currently sends an empty session_id in the SessionStart hook stdin payload, preventing the ×1.2 session boost from working. The ×1.5 project boost is unaffected and provides the primary context-aware ranking. Related upstream issues: #13668 (empty transcript_path), #9188 (stale session_id). Melchizedek's session boost code is tested and ready, and will activate automatically when the upstream fix lands.

Privacy

  • Zero telemetry. No tracking, no analytics, no network calls (except optional lazy model download).
  • Read-only on transcripts. Never writes to ~/.claude/projects/. All data in ~/.melchizedek/.
  • <private> tag support. Content between <private>...</private> is replaced with [REDACTED] before indexing.
  • Local-only. Your conversations never leave your machine.

Requirements

  • Node.js >= 20
  • Claude Code >= 2.0
  • macOS, Linux, or Windows

License

MIT


"Without father, without mother, without genealogy, having neither beginning of days nor end of life."

  • Hebrews 7:3

Built by @louis49

Reviews

No reviews yet

Sign in to write a review