MCP Hub
Back to servers

rag-vault

Local RAG MCP server with hybrid search, PDF/DOCX support, and zero-config setup

Registry
Updated
Jan 28, 2026

Quick Install

npx -y @robthepcguy/rag-vault

RAG Vault

License: MIT TypeScript MCP Registry

Your documents. Your machine. Your control.

RAG Vault gives AI coding assistants instant access to your private documents—API specs, research papers, internal docs—without ever sending data to the cloud. One command, zero configuration, complete privacy.

Why RAG Vault?

Pain PointRAG Vault Solution
"I don't want my docs on someone else's server"Everything stays local. No API calls after setup.
"Semantic search misses exact code terms"Hybrid search: meaning + exact matches like useEffect
"Setup requires Docker, Python, databases..."One npx command. Done.
"Cloud APIs charge per query"Free forever. No subscriptions.

Security

RAG Vault includes security features for production deployment:

  • API Authentication — Optional API key via RAG_API_KEY
  • Rate Limiting — Configurable request throttling
  • CORS Control — Restrict allowed origins
  • Security Headers — Helmet.js protection

See SECURITY.md for complete documentation.

Get Started in 30 Seconds

For Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "local-rag": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "github:RobThePCGuy/rag-vault"],
      "env": {
        "BASE_DIR": "/path/to/your/documents"
      }
    }
  }
}

For Claude Code

Add to .mcp.json in your project directory:

{
  "mcpServers": {
    "local-rag": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "github:RobThePCGuy/rag-vault"],
      "env": {
        "BASE_DIR": "./documents",
        "DB_PATH": "./documents/.rag-db",
        "CACHE_DIR": "./.cache",
        "RAG_HYBRID_WEIGHT": "0.6",
        "RAG_GROUPING": "related"
      }
    }
  }
}

Or add inline via CLI:

claude mcp add local-rag --scope user --env BASE_DIR=/path/to/your/documents -- npx -y github:RobThePCGuy/rag-vault

For Codex

Add to ~/.codex/config.toml:

[mcp_servers.local-rag]
command = "npx"
args = ["-y", "github:RobThePCGuy/rag-vault"]

[mcp_servers.local-rag.env]
BASE_DIR = "/path/to/your/documents"

Install Skills (Optional)

For enhanced AI guidance on query formulation and result interpretation, install the RAG Vault skills:

# Claude Code (project-level - recommended for team projects)
npx github:RobThePCGuy/rag-vault skills install --claude-code

# Claude Code (user-level - available in all projects)
npx github:RobThePCGuy/rag-vault skills install --claude-code --global

# Codex (user-level)
npx github:RobThePCGuy/rag-vault skills install --codex

# Custom location
npx github:RobThePCGuy/rag-vault skills install --path /your/custom/path

Skills teach Claude best practices for:

  • Query formulation and expansion strategies
  • Score interpretation (< 0.3 = good match, > 0.5 = skip)
  • When to use ingest_file vs ingest_data
  • HTML ingestion and URL handling

Restart your AI tool, and start talking:

You: "Ingest api-spec.pdf"
AI:  Successfully ingested api-spec.pdf (47 chunks)

You: "How does authentication work?"
AI:  Based on section 3.2, authentication uses OAuth 2.0 with JWT tokens...

That's it. No Docker. No Python. No servers.

Web Interface

RAG Vault includes a full-featured web UI for managing your documents without the command line.

Launch the Web UI

npx github:RobThePCGuy/rag-vault web

Open http://localhost:3000 in your browser.

What You Can Do

  • Upload documents — Drag and drop PDFs, Word docs, Markdown, text files
  • Search instantly — Type queries and see results with relevance scores
  • Preview content — Click any result to see the full chunk in context
  • Manage files — View all indexed documents, delete what you don't need
  • Switch databases — Create and switch between multiple knowledge bases
  • Monitor status — See document counts, database size, system health

REST API

The web server exposes a REST API for programmatic access. Set RAG_API_KEY to require authentication:

# With authentication (when RAG_API_KEY is set)
curl -X POST "http://localhost:3000/api/v1/search" \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication", "limit": 5}'

# Search documents (no auth required if RAG_API_KEY is not set)
curl -X POST "http://localhost:3000/api/v1/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication", "limit": 5}'

# List all files
curl "http://localhost:3000/api/v1/files"

# Upload a document
curl -X POST "http://localhost:3000/api/v1/files/upload" \
  -F "file=@spec.pdf"

# Delete a file
curl -X DELETE "http://localhost:3000/api/v1/files" \
  -H "Content-Type: application/json" \
  -d '{"filePath": "/path/to/spec.pdf"}'

# Get system status
curl "http://localhost:3000/api/v1/status"

# Health check (for load balancers)
curl "http://localhost:3000/api/v1/health"

Real-World Examples

Search Your Codebase Documentation

You: "Ingest all the markdown files in /docs"
AI:  Ingested 23 files (847 chunks total)

You: "What's the retry policy for failed API calls?"
AI:  According to error-handling.md, failed requests retry 3 times
     with exponential backoff: 1s, 2s, 4s...

Index Web Documentation

You: "Fetch https://docs.example.com/api and ingest the HTML"
AI:  Ingested "docs.example.com/api" (156 chunks)

You: "What rate limits apply to the /users endpoint?"
AI:  The API limits /users to 100 requests per minute per API key...

Build a Personal Knowledge Base

You: "Ingest my research papers folder"
AI:  Ingested 12 PDFs (2,341 chunks)

You: "What do recent studies say about transformer attention mechanisms?"
AI:  Based on attention-mechanisms-2024.pdf, the key finding is...

Search Exact Technical Terms

RAG Vault's hybrid search catches both meaning and exact matches:

You: "Search for ERR_CONNECTION_REFUSED"
AI:  Found 3 results mentioning ERR_CONNECTION_REFUSED:
     1. troubleshooting.md - "When you see ERR_CONNECTION_REFUSED..."
     2. network-errors.pdf - "Common causes include..."

Pure semantic search would miss this. RAG Vault finds it.

How It Works

Document → Parse → Chunk by meaning → Embed locally → Store in LanceDB
                         ↓
Query → Embed → Vector search → Keyword boost → Quality filter → Results

Smart chunking: Splits by meaning, not character count. Keeps code blocks intact.

Hybrid search: Vector similarity finds related content. Keyword boost ranks exact matches higher.

Quality filtering: Groups results by relevance gaps instead of arbitrary top-K cutoffs.

Local everything: Embeddings via Transformers.js. Storage via LanceDB. No network after model download.

Supported Formats

FormatExtensionNotes
PDF.pdfFull text extraction, header/footer filtering
Word.docxTables, lists, formatting preserved
Markdown.mdCode blocks kept intact
Text.txtPlain text
JSON.jsonConverted to searchable key-value text
HTMLvia ingest_dataAuto-cleaned with Readability

Configuration

Environment Variables

VariableDefaultWhat it does
BASE_DIRCurrent directoryOnly files under this path can be accessed
DB_PATH./lancedb/Where vectors are stored
MODEL_NAMEXenova/all-MiniLM-L6-v2HuggingFace embedding model
WEB_PORT3000Port for web interface

Search Tuning

VariableDefaultWhat it does
RAG_HYBRID_WEIGHT0.6Keyword boost strength. 0 = semantic-only, higher = stronger boost for exact keyword matches
RAG_GROUPINGsimilar = top group only, related = top 2 groups
RAG_MAX_DISTANCEFilter out results below this relevance threshold

Security (optional)

VariableDefaultWhat it does
RAG_API_KEYAPI key for authentication
CORS_ORIGINSlocalhostAllowed origins (comma-separated, or *)
RATE_LIMIT_WINDOW_MS60000Rate limit time window (ms)
RATE_LIMIT_MAX_REQUESTS100Max requests per window

Advanced

VariableDefaultWhat it does
ALLOWED_SCAN_ROOTSHome directoryDirectories allowed for database scanning
JSON_BODY_LIMIT5mbMax request body size
REQUEST_TIMEOUT_MS30000API request timeout
REQUEST_LOGGINGfalseEnable request audit logging

Copy .env.example for a complete configuration template.

For code-heavy content, try:

"env": {
  "RAG_HYBRID_WEIGHT": "0.8",
  "RAG_GROUPING": "similar"
}

Frequently Asked Questions

Is my data really private?

Yes. After the embedding model downloads (~90MB), RAG Vault makes zero network requests. Everything runs on your machine. Verify with network monitoring.

Does it work offline?

Yes, after the first run. The model caches locally.

What about GPU acceleration?

Transformers.js runs on CPU. GPU support is experimental but unnecessary for most use cases—queries return in ~1 second even with 10,000 chunks.

Can I change the embedding model?

Yes. Set MODEL_NAME to any compatible HuggingFace model. But you must delete DB_PATH and re-ingest—different models produce incompatible vectors.

Recommended upgrade: For better quality and multilingual support, use EmbeddingGemma:

"MODEL_NAME": "onnx-community/embeddinggemma-300m-ONNX"

This 300M parameter model scores 68.36 on MTEB benchmarks and supports 100+ languages, making it ideal for mixed-language or high-quality retrieval needs.

Other specialized models:

  • Scientific: sentence-transformers/allenai-specter
  • Code: jinaai/jina-embeddings-v2-base-code
How do I back up my data?

Copy the DB_PATH directory (default: ./lancedb/).

Troubleshooting

ProblemSolution
No results foundDocuments must be ingested first. Run "List all ingested files" to check.
Model download failedCheck internet connection. Model is ~90MB from HuggingFace.
File too largeDefault limit is 100MB. Set MAX_FILE_SIZE higher or split the file.
Path outside BASE_DIRAll file paths must be under BASE_DIR. Use absolute paths.
MCP tools not showingVerify config syntax, restart your AI tool completely (Cmd+Q on Mac).
401 UnauthorizedAPI key required. Set RAG_API_KEY or use correct header format.
429 Too Many RequestsRate limited. Wait for reset or increase RATE_LIMIT_MAX_REQUESTS.
CORS errorsAdd your origin to CORS_ORIGINS environment variable.

Development

git clone https://github.com/RobThePCGuy/rag-vault.git
cd rag-vault
pnpm install

# Run tests
pnpm test

# Type check + lint + format
pnpm check:all

# Build
pnpm build

# Run MCP server locally
pnpm dev

# Run web server locally
pnpm web:dev

Project Structure

src/
├── server/      # MCP tool handlers
├── vectordb/    # LanceDB + hybrid search
├── chunker/     # Semantic text splitting
├── embedder/    # Transformers.js wrapper
├── parser/      # PDF, DOCX, HTML parsing
├── web/         # Express server + REST API
└── __tests__/   # Test suites

web-ui/          # React frontend

Documentation

  • SECURITY.md — Security configuration and best practices
  • .env.example — Complete environment variable template

License

MIT — free for personal and commercial use.

Acknowledgments

Built with Model Context Protocol, LanceDB, and Transformers.js.

Started as a fork of mcp-local-rag by Shinsuke Kagawa. Now it’s its own thing. Huge credit to upstream contributors for the foundation, I’ve been iterating hard from there. Local-first dev tools, all the way.

Reviews

No reviews yet

Sign in to write a review