MCP Hub
Back to servers

Semantic Search MCP

Enables semantic search over markdown files to find related notes by meaning rather than keywords, and automatically detect duplicate content before creating new notes.

Forks
1
Updated
Feb 16, 2026

Semantic Search

Semantic search over markdown files. Find related notes by meaning, not just keywords. Detect duplicates before creating new notes.

Supports two server modes:

  • MCP mode — For Claude Code integration
  • REST mode — For OpenClaw, scripts, and HTTP clients

Features

  • Semantic search using sentence-transformers
  • Duplicate/similar note detection
  • Auto-updating index with file watcher
  • Multi-directory support
  • Inline tag extraction (#tag-name)

Installation

Permanent install (recommended)

# Install as a tool (creates ~/.local/bin/semantic-search-mcp)
uv tool install git+https://github.com/bborbe/semantic-search

💡 No GPU? Use CPU-only PyTorch

The default install includes CUDA support (~7GB). If you don't have a dedicated GPU, install with CPU-only PyTorch to save ~5GB disk space:

uv tool install --index https://download.pytorch.org/whl/cpu \
  git+https://github.com/bborbe/semantic-search

Performance is identical for typical vault sizes — embedding models run fine on CPU.

One-off usage

# Run directly with uvx (no install needed)
uvx --from git+https://github.com/bborbe/semantic-search semantic-search-mcp serve

From PyPI (when published)

pip install semantic-search-mcp

Server Modes

MCP Mode (for Claude Code)

claude mcp add -s project semantic-search \
  --env CONTENT_PATH=/path/to/vault \
  -- \
  uvx --from git+https://github.com/bborbe/semantic-search semantic-search-mcp serve

Tools available:

  • search_related(query, top_k=5) — Find semantically related notes
  • check_duplicates(file_path) — Detect duplicate/similar notes

REST Mode (for OpenClaw/HTTP)

# Start server
CONTENT_PATH=/path/to/vault semantic-search-mcp serve --mode rest --port 8321

# Or with uvx
CONTENT_PATH=/path/to/vault uvx --from git+https://github.com/bborbe/semantic-search \
  semantic-search-mcp serve --mode rest --port 8321

Endpoints:

EndpointMethodDescription
/search?q=...&top_k=5GETSemantic search
/duplicates?file=...&threshold=0.85GETFind duplicate notes
/healthGETHealth check with index stats
/reindexGET/POSTForce index rebuild

Example queries:

# Search
curl 'http://localhost:8321/search?q=kubernetes+deployment'

# Find duplicates
curl 'http://localhost:8321/duplicates?file=notes/my-note.md'

# Health check
curl 'http://localhost:8321/health'

CLI Commands

One-shot commands without running a server:

# Search
CONTENT_PATH=/path/to/vault semantic-search-mcp search "kubernetes deployment"

# Find duplicates
CONTENT_PATH=/path/to/vault semantic-search-mcp duplicates path/to/note.md

Configuration

Environment Variables

VariableDescriptionDefault
CONTENT_PATHDirectory to index (comma-separated for multiple)./content
LOG_LEVELLogging level (DEBUG, INFO, WARNING, ERROR)INFO

Multiple Directories

Index multiple directories by separating paths with commas:

CONTENT_PATH=/path/to/vault1,/path/to/vault2,/path/to/docs

All directories are indexed together and searched as one unified index.

How It Works

First run downloads a small embedding model (~90MB) and indexes your markdown files (<1s for typical vaults). The index auto-updates when files change via filesystem watcher.

Indexed Content

Each markdown file is indexed with weighted components:

ComponentWeightNotes
Filename3x
Frontmatter title3x
Frontmatter tags2xMerged with inline tags
Frontmatter aliases2x
Inline tags (#tag)2xExtracted from body
First H1 heading2x
Body content1xFirst 500 words

Development

# Clone
git clone https://github.com/bborbe/semantic-search
cd semantic-search

# Install dev dependencies
make install

# Run checks
make check

# Run tests
make test

License

BSD 2-Clause License — see LICENSE.

Reviews

No reviews yet

Sign in to write a review