Semantic Search MCP Server
Semantic code search for Claude Code — powered by hybrid BM25 + vector retrieval.
What It Does
An MCP server that gives Claude Code semantic search over your codebase. Unlike pure vector search, this uses hybrid retrieval (BM25 keyword matching + vector semantic search + Reciprocal Rank Fusion) for significantly better results.
Quick Start
# Install
cd semantic_search_MCP
pip install -e ".[dev]"
# Test the server with MCP inspector
npx @modelcontextprotocol/inspector python -m semantic_search_mcp
# Connect to Claude Code
claude mcp add semantic-search -- python -m semantic_search_mcp
Tools
| Tool | Description |
|---|---|
search(query, repo_path?, top_k?, file_glob?) | Hybrid semantic + keyword search. Auto-indexes if needed. |
index(repo_path?, force_rebuild?) | Build or rebuild the search index. |
status(repo_path?) | Check if a repo is indexed and whether the index is stale. |
Example Usage (inside Claude Code)
> Search for where JWT validation happens
> Index this repository first, then find auth middleware
> Search for error handling in src/api/**
Architecture
- Chunking: Language-aware regex splitting (10 languages) with context headers
- Embeddings:
all-MiniLM-L6-v2via sentence-transformers (local, no API key) - Vector store: LanceDB (serverless, file-based)
- BM25: SQLite FTS5 sidecar
- Retrieval: Hybrid BM25 + vector with RRF merge
Stack
- Python 3.11+
- FastMCP (MCP server framework)
- LanceDB (vector storage)
- sentence-transformers (embeddings)
- SQLite FTS5 (keyword search)
Configuration
Set via environment variables:
| Variable | Default | Description |
|---|---|---|
SEMANTIC_SEARCH_DATA_DIR | ~/.semantic-search/data | Where indexes are stored |