🚀 Tea Rags MCP

This is a fork of mcp-server-qdrant

A high-performance Model Context Protocol (MCP) server for semantic search using Qdrant vector database. Optimized for fast codebase indexing and incremental re-indexing.

🙏 Acknowledgments

Huge thanks to the qdrant/mcp-server-qdrant team and all contributors to the original project!

Special appreciation for:

💎 Clean and extensible architecture
📚 Excellent documentation and examples
🧪 Solid test coverage
🤝 Open-source spirit and MIT license

This fork is built on the solid foundation of your work. Thank you for your contribution to the community! 💜

⚡ Fork Highlights

Why tea-rags-mcp?

🚀 Optimized embedding pipeline — indexing and re-indexing takes minutes, not hours
🔥 1000x faster deletions — payload indexes make filter-based deletes instant
⚡ Parallel processing — sharded snapshots, concurrent workers, batched operations
🎯 Smart batching — automatic batch formation with backpressure control
🛠️ Production-ready — auto-migration, checkpointing, resume from interruption
💎 Ruby AST Aware - supports ruby Ruby AST aware chunking

🍴 Why Fork?

Why a fork instead of PRs to the original?

I love to experiment. A lot. And fast. 🧪

Coordinating changes with maintainers is the right thing to do, but it takes time: discussions, reviews, compromises, waiting. Sometimes an idea lives for a day, sometimes it turns into something useful.

A fork gives me freedom to try crazy ideas without fear of breaking someone else's project or wasting anyone's time reviewing something that might not even work.

For maintainers & contributors: If you find something useful here — feel free to cherry-pick it into upstream. No need to ask, MIT license covers it. Questions? Reach me at: artk0re@icloud.com 📬

TL;DR: This is an experimental playground. Use at your own risk. For production, I recommend the original project.

✨ What's New in This Fork

Feature	Original	This Fork
Snapshot storage	Single JSON file	🔀 Sharded storage (v3)
Change detection	Sequential	⚡ Parallel (N workers)
Hash distribution	—	🎯 Consistent hashing
Merkle tree	Single level	🌳 Two-level (shard + meta)
Concurrency control	Fixed	🎛️ `EMBEDDING_CONCURRENCY` env
Delete operations	Filter scan	⚡ Payload index (1000x faster)
Batch pipeline	Sequential	🔄 Parallel with backpressure

🔀 Sharded Snapshots (v3 format)

File hashes are stored across multiple shards instead of a single file:

Parallel read/write across shards
Atomic updates via directory swap
Checksum validation per shard

⚡ Parallel Change Detection

Change detection runs in parallel across all shards:


# Control parallelism (default: 4)

export EMBEDDING_CONCURRENCY=8

🎯 Consistent Hashing

When changing the number of workers, minimal files are redistributed:

4 → 8 workers: ~50% files stay in place (vs ~25% with modulo)
Virtual nodes ensure even distribution

🌳 Two-Level Merkle Tree

Fast "any changes?" check:

Compare meta root hash (single read)
If changed — read only affected shards

📝 Future Improvements

Auto-detection of optimal concurrency based on CPU/IO
Compression for large shards
File locking for concurrent access

Features

Zero Setup: Works out of the box with Ollama - no API keys required
Privacy-First: Local embeddings and vector storage - data never leaves your machine
Code Vectorization: Intelligent codebase indexing with AST-aware chunking and semantic code search
Multiple Providers: Ollama (default), OpenAI, Cohere, and Voyage AI
Hybrid Search: Combine semantic and keyword search for better results
Semantic Search: Natural language search with metadata filtering
Incremental Indexing: Efficient updates - only re-index changed files
Git Blame Metadata: Enrich code with authorship, dates, churn metrics, and task IDs from commit history
Flexible Performance Tuning: Configurable batch sizes, concurrency, and pipeline parameters for maximum resource utilization
Smart Caching: Two-level cache (memory + disk) with content-hash invalidation for git blame and file snapshots
Configurable Prompts: Create custom prompts for guided workflows without code changes
Rate Limiting: Intelligent throttling with exponential backoff
Full CRUD: Create, search, and manage collections and documents
Flexible Deployment: Run locally (stdio) or as a remote HTTP server
API Key Authentication: Connect to secured Qdrant instances (Qdrant Cloud, self-hosted with API keys)

🔍 Feature Details

Git Blame Metadata

Each code chunk is enriched with aggregated signals from git blame:

Dominant author — who wrote most lines in the chunk (for ownership questions)
All authors — everyone who contributed to this code
Timestamps — first created and last modified dates
Age in days — how stale is the code
Commit count — churn indicator (high = frequently changed = potentially problematic)
Task IDs — automatically extracted from commit messages (JIRA, GitHub, Azure DevOps patterns)

This enables powerful filters: find code by author, find legacy code, find high-churn areas, trace code to tickets.

Flexible Performance Tuning

Every bottleneck is configurable via environment variables:

Layer	Variables	Purpose
Embedding	`EMBEDDING_BATCH_SIZE`, `EMBEDDING_CONCURRENCY`	GPU utilization, parallel requests
Pipeline	`CODE_BATCH_SIZE`, `BATCH_FORMATION_TIMEOUT_MS`	Batch accumulation strategy
Qdrant	`DELETE_BATCH_SIZE`, `DELETE_CONCURRENCY`	Bulk operations throughput
I/O	`MAX_IO_CONCURRENCY`, `FILE_PROCESSING_CONCURRENCY`	Parallel file reads

The pipeline uses backpressure control — if Qdrant or embeddings slow down, file processing automatically pauses to prevent memory overflow.

Smart Caching

Two-level caching minimizes redundant work:

Cache	Storage	Invalidation	Purpose
Git blame	Memory (L1) + Disk (L2)	Content hash	Avoid re-running `git blame` for unchanged files
File snapshots	Sharded JSON	Merkle tree	Fast "any changes?" check for incremental indexing
Collection info	Memory	TTL	Reduce Qdrant API calls

Content-hash invalidation: Cache keys include file content hash, so changing a file automatically invalidates its cached blame data — no stale data, no manual cache clearing.

Quick Start

Prerequisites

Node.js 22+
Podman or Docker with Compose support

Installation


# Clone and install

git clone https://github.com/mhalder/qdrant-mcp-server.git
cd qdrant-mcp-server
npm install

# Start services (choose one)

podman compose up -d   # Using Podman
docker compose up -d   # Using Docker

# Pull the embedding model

podman exec ollama ollama pull nomic-embed-text  # Podman
docker exec ollama ollama pull nomic-embed-text  # Docker

# Build

npm run build

Configuration

Add to Claude Code (recommended)

# Local setup (Qdrant + Ollama on localhost)
claude mcp add tea-rags -s user -- node /path/to/tea-rags-mcp/build/index.js \
  -e QDRANT_URL=http://localhost:6333 \
  -e EMBEDDING_BASE_URL=http://localhost:11434

# Remote server setup (Qdrant + Ollama on separate host)
claude mcp add tea-rags -s user -- node /path/to/tea-rags-mcp/build/index.js \
  -e QDRANT_URL=http://192.168.1.100:6333 \
  -e EMBEDDING_BASE_URL=http://192.168.1.100:11434

# Qdrant Cloud with API key
claude mcp add tea-rags -s user -- node /path/to/tea-rags-mcp/build/index.js \
  -e QDRANT_URL=https://your-cluster.qdrant.io:6333 \
  -e QDRANT_API_KEY=your-api-key-here \
  -e EMBEDDING_BASE_URL=http://localhost:11434

Remote Setup (HTTP transport)

⚠️ Security Warning: When deploying the HTTP transport in production:

Always run behind a reverse proxy (nginx, Caddy) with HTTPS

Implement authentication/authorization at the proxy level

Use firewalls to restrict access to trusted networks

Never expose directly to the public internet without protection

Consider implementing rate limiting at the proxy level

Monitor server logs for suspicious activity

Start the server:

TRANSPORT_MODE=http HTTP_PORT=3000 node build/index.js

Configure client:

{
  "mcpServers": {
    "qdrant": {
      "url": "http://your-server:3000/mcp"
    }
  }
}

Using a different provider:

"env": {
  "EMBEDDING_PROVIDER": "openai",  // or "cohere", "voyage"
  "OPENAI_API_KEY": "sk-...",      // provider-specific API key
  "QDRANT_URL": "http://localhost:6333"
}

Restart after making changes.

See Advanced Configuration section below for all options.

Tools

Collection Management

Tool	Description
`create_collection`	Create collection with specified distance metric (Cosine/Euclid/Dot)
`list_collections`	List all collections
`get_collection_info`	Get collection details and statistics
`delete_collection`	Delete collection and all documents

Document Operations

Tool	Description
`add_documents`	Add documents with automatic embedding (supports string/number IDs, metadata)
`semantic_search`	Natural language search with optional metadata filtering
`hybrid_search`	Hybrid search combining semantic and keyword (BM25) search with RRF
`delete_documents`	Delete specific documents by ID

Code Vectorization

Tool	Description
`index_codebase`	Index a codebase for semantic code search with AST-aware chunking
`search_code`	Search indexed codebase using natural language queries
`reindex_changes`	Incrementally re-index only changed files (detects added/modified/deleted)
`get_index_status`	Get indexing status and statistics for a codebase
`clear_index`	Delete all indexed data for a codebase

Resources

qdrant://collections - List all collections
qdrant://collection/{name} - Collection details

Configurable Prompts

Create custom prompts tailored to your specific use cases without modifying code. Prompts provide guided workflows for common tasks.

Note: By default, the server looks for prompts.json in the project root directory. If the file exists, prompts are automatically loaded. You can specify a custom path using the PROMPTS_CONFIG_FILE environment variable.

Setup

Create a prompts configuration file (e.g., prompts.json in the project root):

See prompts.example.json for example configurations you can copy and customize.
Configure the server (optional - only needed for custom path):

If you place prompts.json in the project root, no additional configuration is needed. To use a custom path:

{
  "mcpServers": {
    "qdrant": {
      "command": "node",
      "args": ["/path/to/qdrant-mcp-server/build/index.js"],
      "env": {
        "QDRANT_URL": "http://localhost:6333",
        "PROMPTS_CONFIG_FILE": "/custom/path/to/prompts.json"
      }
    }
  }
}

Use prompts in your AI assistant:

Claude Code:

/mcp__qdrant__find_similar_docs papers "neural networks" 10

VSCode:

/mcp.qdrant.find_similar_docs papers "neural networks" 10

Example Prompts

See prompts.example.json for ready-to-use prompts including:

find_similar_docs - Semantic search with result explanation
setup_rag_collection - Create RAG-optimized collections
analyze_collection - Collection insights and recommendations
bulk_add_documents - Guided bulk document insertion
search_with_filter - Metadata filtering assistance
compare_search_methods - Semantic vs hybrid search comparison
collection_maintenance - Maintenance and cleanup workflows
migrate_to_hybrid - Collection migration guide

Template Syntax

Templates use {{variable}} placeholders:

Required arguments must be provided
Optional arguments use defaults if not specified
Unknown variables are left as-is in the output

Code Vectorization (1.1)

Intelligently index and search your codebase using semantic code search. Perfect for AI-assisted development, code exploration, and understanding large codebases.

Features (1.1.1)

AST-Aware Chunking: Intelligent code splitting at function/class boundaries using tree-sitter
Multi-Language Support: 35+ file types including TypeScript, Python, Java, Go, Rust, C++, and more
Incremental Updates: Only re-index changed files for fast updates
Smart Ignore Patterns: Respects .gitignore, .dockerignore, and custom .contextignore files
Semantic Search: Natural language queries to find relevant code
Metadata Filtering: Filter by file type, path patterns, or language
Local-First: All processing happens locally - your code never leaves your machine

Quick Start (1.1.1)

1. Index your codebase:


# Via Claude Code MCP tool

/mcp__qdrant__index_codebase /path/to/your/project

2. Search your code:


# Natural language search

/mcp__qdrant__search_code /path/to/your/project "authentication middleware"

# Filter by file type

/mcp__qdrant__search_code /path/to/your/project "database schema" --fileTypes .ts,.js

# Filter by path pattern

/mcp__qdrant__search_code /path/to/your/project "API endpoints" --pathPattern src/api/**

3. Update after changes:


# Incrementally re-index only changed files

/mcp__qdrant__reindex_changes /path/to/your/project

Usage Examples

Index a TypeScript Project

// The MCP tool automatically:
// 1. Scans all .ts, .tsx, .js, .jsx files
// 2. Respects .gitignore patterns (skips node_modules, dist, etc.)
// 3. Chunks code at function/class boundaries
// 4. Generates embeddings using your configured provider
// 5. Stores in Qdrant with metadata (file path, line numbers, language)

index_codebase({
  path: "/workspace/my-app",
  forceReindex: false, // Set to true to re-index from scratch
});

// Output:
// ✓ Indexed 247 files (1,823 chunks) in 45.2s

Search for Authentication Code

search_code({
  path: "/workspace/my-app",
  query: "how does user authentication work?",
  limit: 5,
});

// Results include file path, line numbers, and code snippets:
// [
//   {
//     filePath: "src/auth/middleware.ts",
//     startLine: 15,
//     endLine: 42,
//     content: "export async function authenticateUser(req: Request) { ... }",
//     score: 0.89,
//     language: "typescript"
//   },
//   ...
// ]

Search with Filters

// Only search TypeScript files
search_code({
  path: "/workspace/my-app",
  query: "error handling patterns",
  fileTypes: [".ts", ".tsx"],
  limit: 10,
});

// Only search in specific directories
search_code({
  path: "/workspace/my-app",
  query: "API route handlers",
  pathPattern: "src/api/**",
  limit: 10,
});

Incremental Re-indexing

// After making changes to your codebase
reindex_changes({
  path: "/workspace/my-app",
});

// Output:
// ✓ Updated: +3 files added, ~5 files modified, -1 files deleted
// ✓ Chunks: +47 added, -23 deleted in 8.3s

Check Indexing Status

get_index_status({
  path: "/workspace/my-app",
});

// Output:
// {
//   status: "indexed",      // "not_indexed" | "indexing" | "indexed"
//   isIndexed: true,        // deprecated: use status instead
//   collectionName: "code_a3f8d2e1",
//   chunksCount: 1823,
//   filesCount: 247,
//   lastUpdated: "2025-01-30T10:15:00Z",
//   languages: ["typescript", "javascript", "json"]
// }

Supported Languages

Programming Languages (35+ file types):

Web: TypeScript, JavaScript, Vue, Svelte
Backend: Python, Java, Go, Rust, Ruby, PHP
Systems: C, C++, C#
Mobile: Swift, Kotlin, Dart
Functional: Scala, Clojure, Haskell, OCaml
Scripting: Bash, Shell, Fish
Data: SQL, GraphQL, Protocol Buffers
Config: JSON, YAML, TOML, XML, Markdown

See configuration for full list and customization options.

Custom Ignore Patterns

Create a .contextignore file in your project root to specify additional patterns to ignore:


# .contextignore

**/test/**
**/*.test.ts
**/*.spec.ts
**/fixtures/**
**/mocks/**
**/__tests__/**

Best Practices

Index Once, Update Incrementally: Use index_codebase for initial indexing, then reindex_changes for updates
Use Filters: Narrow search scope with fileTypes and pathPattern for better results
Meaningful Queries: Use natural language that describes what you're looking for (e.g., "database connection pooling" instead of "db")
Check Status First: Use get_index_status to verify a codebase is indexed before searching
Local Embedding: Use Ollama (default) to keep everything local and private

Git Metadata Enrichment

Enrich code search with git history information. When enabled, each code chunk is annotated with authorship, modification dates, and task IDs from commit messages.

Enable git metadata:

export CODE_ENABLE_GIT_METADATA=true

What's captured (per chunk):

Signal	Description	Use Case
`dominantAuthor`	Author with most lines in chunk	"Find code written by John"
`authors[]`	All authors who touched the chunk	Team attribution
`lastModifiedAt`	Unix timestamp of latest change	"Code changed after 2024-01-01"
`firstCreatedAt`	Unix timestamp of oldest change	Code origin tracking
`ageDays`	Days since last modification	"Old code (>365 days)"
`commitCount`	Number of unique commits	Churn indicator (high = frequently changed)
`taskIds[]`	Extracted from commit messages	"Find code for TD-1234"
`lastCommitHash`	Most recent commit SHA	Audit trail

Search with git filters:

// Find code by author
search_code({
  path: "/workspace/my-app",
  query: "authentication logic",
  author: "John Doe",
});

// Find recently changed code
search_code({
  path: "/workspace/my-app",
  query: "payment processing",
  maxAgeDays: 30, // Changed in last 30 days
});

// Find old code (potential tech debt)
search_code({
  path: "/workspace/my-app",
  query: "database connection",
  minAgeDays: 365, // Not changed in a year
});

// Find code by task ID
search_code({
  path: "/workspace/my-app",
  query: "user registration",
  taskId: "TD-1234",
});

// Find high-churn code
search_code({
  path: "/workspace/my-app",
  query: "API handlers",
  minCommitCount: 10, // Changed 10+ times
});

// Combine filters
search_code({
  path: "/workspace/my-app",
  query: "error handling",
  author: "Jane",
  modifiedAfter: "2024-01-01",
  modifiedBefore: "2024-06-30",
});

Task ID extraction:

Task IDs are automatically extracted from commit summary lines:

Pattern	Example	Extracted
JIRA/Linear	`feat: implement TD-1234 feature`	`TD-1234`
GitHub	`fix: resolve issue #123`	`#123`
Azure DevOps	`feat: add AB#456 functionality`	`AB#456`
GitLab MR	`merge !789 changes`	`!789`

Algorithm details:

One git blame call per file (cached by content hash)
Aggregated signals only — no per-line storage overhead
Commit messages are NOT stored (only extracted task IDs)
Cache invalidates automatically when file content changes
L1 (memory) + L2 (disk) caching for performance

📋 Example Questions for Code Analysis with Git Metadata

By Author

Question	Filters
What code did John write?	`author="John"`
Who is the expert on the auth module?	`query="authorization"` → check `dominantAuthor`
Who can help me understand this code?	Search → find author with most contributions
Whose code needs review from last week?	`author="John"`, `maxAgeDays=7`
Whose code changes most frequently?	`author="X"`, `minCommitCount=5`

By Code Age

Question	Filters
What code hasn't been touched in a while?	`minAgeDays=90`
What changed in the last week?	`maxAgeDays=7`
What legacy code needs documentation?	`minAgeDays=60`, `query="service"`
What was done in this sprint?	`maxAgeDays=14`
What old code is still being used?	`minAgeDays=180`, `query="import"`
Which components haven't been updated in a year?	`minAgeDays=365`

By Change Frequency (Churn)

Question	Filters
What code is frequently rewritten? (problematic)	`minCommitCount=10`
Where are there many hotfixes?	`minCommitCount=5`, `query="fix"`
Which modules are most unstable?	`minCommitCount=8`
What needs refactoring?	`minCommitCount=6`, `minAgeDays=30`
Where do bugs appear most often?	`minCommitCount=7`, `query="error handling"`

By Task/Ticket ID

Question	Filters
What code relates to JIRA-1234?	`taskId="JIRA-1234"`
What was done for GitHub issue #567?	`taskId="#567"`
What code is linked to this requirement?	`taskId="REQ-100"`
Show everything related to feature X	`taskId="FEAT-X"`
Which files were affected by this task?	`taskId="TD-5678"`

By Date Range

Question	Filters
What changed after release 1.0?	`modifiedAfter="2024-03-01"`
What code existed before the refactoring?	`modifiedBefore="2024-01-01"`
What changed between releases?	`modifiedAfter="2024-01-01"`, `modifiedBefore="2024-06-01"`
What was done in Q1 2024?	`modifiedAfter="2024-01-01"`, `modifiedBefore="2024-04-01"`

Combined Queries

Question	Filters
Complex code that hasn't changed and needs docs	`query="complex"`, `minAgeDays=60`
John's recent code in the payment module	`author="John"`, `maxAgeDays=14`, `query="payment"`
Old high-churn code (risk!)	`minAgeDays=90`, `minCommitCount=5`
Code for a task that was frequently reworked	`taskId="X"`, `minCommitCount=3`
What a specific author did for a task	`author="John"`, `taskId="TD-123"`
Legacy code in critical modules	`minAgeDays=180`, `pathPattern="src/core/**"`
Recent changes in authentication	`maxAgeDays=7`, `query="authentication"`
Problematic areas in the last month	`maxAgeDays=30`, `minCommitCount=4`

Analytical Questions

Question	Approach
Where has technical debt accumulated?	`minAgeDays=90` + `minCommitCount=1` (changed but not refactored)
What code needs test coverage?	`minCommitCount=5` (frequently breaks)
Who owns which module?	Group by `pathPattern` + analyze `dominantAuthor`
What code lacks documentation?	`minAgeDays=60` + `documentationOnly=false`
What needs code review?	`maxAgeDays=7` + `minCommitCount=2`

Performance

Typical performance with GPU-accelerated embeddings (Ollama + CUDA/Metal):

Codebase Size	Files	Indexing Time	Search Latency
Small (10k LOC)	~30	~5s	<100ms
Medium (50k LOC)	~150	~15s	<100ms
Large (100k LOC)	~300	~30s	<200ms
Very Large (500k LOC)	~1,500	~2min	<300ms
Enterprise (3.5M LOC)	~10k	~10min	<500ms

Note: Benchmarked with Ollama nomic-embed-text on RTX 4090 / Apple M-series. CPU-only embedding is 5-10x slower.

Examples

See examples/ directory for detailed guides:

Basic Usage - Create collections, add documents, search
Hybrid Search - Combine semantic and keyword search
Knowledge Base - Structured documentation with metadata
Advanced Filtering - Complex boolean filters
Rate Limiting - Batch processing with cloud providers
Code Search - Index codebases and semantic code search

Advanced Configuration

Environment Variables

Core Configuration

Variable	Description	Default
`TRANSPORT_MODE`	"stdio" or "http"	stdio
`HTTP_PORT`	Port for HTTP transport	3000
`HTTP_REQUEST_TIMEOUT_MS`	Request timeout for HTTP transport (ms)	300000
`EMBEDDING_PROVIDER`	"ollama", "openai", "cohere", "voyage"	ollama
`QDRANT_URL`	Qdrant server URL	http://localhost:6333
`QDRANT_API_KEY`	API key for Qdrant authentication	-
`PROMPTS_CONFIG_FILE`	Path to prompts configuration JSON	prompts.json

Embedding Configuration

Variable	Description	Default
`EMBEDDING_MODEL`	Model name	Provider-specific
`EMBEDDING_BASE_URL`	Custom API URL	Provider-specific
`EMBEDDING_DIMENSION`	Vector dimensions (auto-detected from model)	Auto
`EMBEDDING_BATCH_SIZE`	Texts per embedding request (Ollama native batch)	64
`EMBEDDING_CONCURRENCY`	Parallel embedding requests (for multiple GPUs)	1
`EMBEDDING_MAX_REQUESTS_PER_MINUTE`	Rate limit	Provider-specific
`EMBEDDING_RETRY_ATTEMPTS`	Retry count	3
`EMBEDDING_RETRY_DELAY`	Initial retry delay (ms)	1000
`OPENAI_API_KEY`	OpenAI API key	-
`COHERE_API_KEY`	Cohere API key	-
`VOYAGE_API_KEY`	Voyage AI API key	-

Code Vectorization Configuration

Variable	Description	Default
`CODE_CHUNK_SIZE`	Maximum chunk size in characters	2500
`CODE_CHUNK_OVERLAP`	Overlap between chunks in characters	300
`CODE_ENABLE_AST`	Enable AST-aware chunking (tree-sitter)	true
`CODE_BATCH_SIZE`	Number of chunks to embed in one batch	100
`CODE_CUSTOM_EXTENSIONS`	Additional file extensions (comma-separated)	-
`CODE_CUSTOM_IGNORE`	Additional ignore patterns (comma-separated)	-
`CODE_DEFAULT_LIMIT`	Default search result limit	5
`CODE_ENABLE_GIT_METADATA`	Enrich chunks with git blame (author, dates, tasks)	false

Qdrant Batch Pipeline Configuration

Variable	Description	Default
`QDRANT_FLUSH_INTERVAL_MS`	Auto-flush buffer interval (0 to disable timer)	500
`QDRANT_BATCH_ORDERING`	Ordering mode: "weak", "medium", or "strong"	weak
`DELETE_BATCH_SIZE`	Paths per delete batch (with payload index, larger is efficient)	500
`DELETE_CONCURRENCY`	Parallel delete requests (Qdrant-bound, not embedding-bound)	8

Note: CODE_BATCH_SIZE controls both embedding batch size and Qdrant upsert buffer size for simplified configuration.

Delete Optimization (v4 schema): Collections created with schema v4+ have a relativePath payload index for fast filter-based deletes. Existing collections are auto-migrated on first reindex_changes call.

Performance & Debug Configuration

Variable	Description	Default
`MAX_IO_CONCURRENCY`	Max parallel file I/O operations during cache sync	50
`DEBUG`	Enable debug timing logs (`true` or `1` to enable)	false

Performance Tuning Notes:

MAX_IO_CONCURRENCY: Controls parallel file reads during reindex_changes. For MacBook with NVMe SSD, 50-100 is optimal. Too high (500+) can saturate the kernel I/O scheduler.
DEBUG: When enabled, logs detailed timing for cache initialization, shard processing, and pipeline stages.

Data Directories

The server stores data in ~/.qdrant-mcp/:

Directory	Purpose
`snapshots/`	Sharded file hash snapshots for incremental indexing
`logs/`	Debug logs when `DEBUG=1` is enabled

Snapshot Structure (v3):

~/.qdrant-mcp/snapshots/
└── code_<hash>/           # Collection-specific directory
    └── v3/                # Format version
        ├── meta.json      # Merkle root + metadata
        ├── shard-0.json   # File hashes for shard 0
        ├── shard-1.json   # File hashes for shard 1
        └── ...            # More shards based on EMBEDDING_CONCURRENCY

Debug Logs: When DEBUG=1, pipeline operations are logged to ~/.qdrant-mcp/logs/pipeline-<timestamp>.log:

Batch formation and processing times
Queue depth and backpressure events
Embedding and Qdrant call durations
Fallback triggers and error details

Provider Comparison

Provider	Models	Dimensions	Rate Limit	Notes
Ollama	`nomic-embed-text` (default), `jina-embeddings-v2-base-code`, `mxbai-embed-large`	768, 768, 1024	None	Local, no API key
OpenAI	`text-embedding-3-small` (default), `text-embedding-3-large`	1536, 3072	3500/min	Cloud API
Cohere	`embed-english-v3.0` (default), `embed-multilingual-v3.0`	1024	100/min	Multilingual support
Voyage	`voyage-2` (default), `voyage-large-2`, `voyage-code-2`	1024, 1536	300/min	Code-specialized

Recommended: Jina Code Embeddings

For code search, we recommend jina-embeddings-v2-base-code over the default nomic-embed-text:

# Pull the model
ollama pull jina-embeddings-v2-base-code

# Configure
export EMBEDDING_MODEL="jina-embeddings-v2-base-code"

Why Jina Code Embeddings?

Aspect	Benefit
Code-optimized	Trained specifically on source code, understands syntax and semantics
Multilingual	30+ programming languages with consistent quality
Enterprise-proven	Battle-tested on 3.5M+ LOC codebases with excellent search relevance
Same dimensions	768 dimensions — drop-in replacement for `nomic-embed-text`

Note: Ollama models require pulling before use:

Podman: podman exec ollama ollama pull <model-name>
Docker: docker exec ollama ollama pull <model-name>

Troubleshooting

Issue	Solution
Qdrant not running	`podman compose up -d` or `docker compose up -d`
Collection missing	Create collection first before adding documents
Ollama not running	Verify with `curl <http://localhost:11434`>, start with `podman compose up -d`
Model missing	`podman exec ollama ollama pull nomic-embed-text` or `docker exec ollama ollama pull ...`
Rate limit errors	Adjust `EMBEDDING_MAX_REQUESTS_PER_MINUTE` to match your provider tier
API key errors	Verify correct API key in environment configuration
Qdrant unauthorized	Set `QDRANT_API_KEY` environment variable for secured instances
Filter errors	Ensure Qdrant filter format, check field names match metadata
Codebase not indexed	Run `index_codebase` before `search_code`
Slow indexing	Use Ollama (local) for faster indexing, or increase `CODE_BATCH_SIZE`
Files not found	Check `.gitignore` and `.contextignore` patterns
Search returns no results	Try broader queries, check if codebase is indexed with `get_index_status`
Out of memory during index	Reduce `CODE_CHUNK_SIZE` or `CODE_BATCH_SIZE`

Performance Tuning

Recommended Configurations

Optimal parameters depend on your hardware and deployment setup:

Remote Server (Qdrant + Ollama on separate host)

Best for: Dedicated GPU server, shared team infrastructure

# Network-optimized: larger batches, moderate concurrency
export EMBEDDING_BATCH_SIZE=512
export CODE_BATCH_SIZE=768
export EMBEDDING_CONCURRENCY=4
export DELETE_BATCH_SIZE=500
export DELETE_CONCURRENCY=8

MacBook M1 (8-core, 8GB+ RAM)

Best for: Light development, small-to-medium codebases (<50k files)

# Memory-conscious: smaller batches, low concurrency
export EMBEDDING_BATCH_SIZE=128
export CODE_BATCH_SIZE=256
export EMBEDDING_CONCURRENCY=2
export DELETE_BATCH_SIZE=200
export DELETE_CONCURRENCY=4
export MAX_IO_CONCURRENCY=30

MacBook M3 Pro (12-core, 18GB+ RAM)

Best for: Professional development, medium codebases (<100k files)

# Balanced: moderate batches, good concurrency
export EMBEDDING_BATCH_SIZE=256
export CODE_BATCH_SIZE=512
export EMBEDDING_CONCURRENCY=4
export DELETE_BATCH_SIZE=500
export DELETE_CONCURRENCY=8
export MAX_IO_CONCURRENCY=50

MacBook M4 Max (16-core, 48GB+ RAM)

Best for: Large codebases, maximum local performance

# Performance-optimized: large batches, high concurrency
export EMBEDDING_BATCH_SIZE=512
export CODE_BATCH_SIZE=768
export EMBEDDING_CONCURRENCY=8
export DELETE_BATCH_SIZE=1000
export DELETE_CONCURRENCY=16
export MAX_IO_CONCURRENCY=100

Quick Diagnostic

Run the diagnostic benchmark to automatically find optimal parameters for your setup:


# Set your endpoints

export QDRANT_URL="http://localhost:6333"
export EMBEDDING_BASE_URL="http://localhost:11434"
export EMBEDDING_MODEL="nomic-embed-text"

# Run diagnostic (takes ~30 seconds)

node benchmarks/diagnose.mjs

The diagnostic will test and recommend optimal values for:

EMBEDDING_BATCH_SIZE - texts per embedding API request
CODE_BATCH_SIZE - chunks per Qdrant upsert
EMBEDDING_CONCURRENCY - parallel embedding requests

Understanding Results

Phase 1: Embedding Batch Size
  Testing EMBEDDING_BATCH_SIZE=64   ████████████████████ 124 emb/s
  Testing EMBEDDING_BATCH_SIZE=256  ████████████████████ 158 emb/s
  Testing EMBEDDING_BATCH_SIZE=512  ████████████████████ 174 emb/s  ← Best
  Testing EMBEDDING_BATCH_SIZE=1024 ███████████████░░░░░ 148 emb/s
  ↳ Stopping: performance degradation detected

  ✓ Optimal: EMBEDDING_BATCH_SIZE=512

Green bar (████): Performance close to best
Yellow bar: Slight degradation
Degradation detected: Batch size too large for GPU memory

Benchmark Files

File	Purpose
`benchmarks/diagnose.mjs`	Quick auto-tuning (~30s)
`benchmarks/embedding-batch.mjs`	Detailed EMBEDDING_BATCH_SIZE analysis
`benchmarks/code-batch.mjs`	Detailed CODE_BATCH_SIZE analysis
`benchmarks/concurrency.mjs`	Concurrency + batch size matrix
`benchmarks/pipelining.mjs`	Sequential vs pipelined comparison
`benchmarks/qdrant-optimized.mjs`	Qdrant wait/ordering options
`benchmarks/accumulator-buffer.mjs`	Buffer size + auto-flush optimization

Batch Pipeline Optimization

The server uses an accumulator pattern for efficient Qdrant upserts:

Embeddings ──► Buffer (accumulator) ──► Qdrant upsert
                 │                           │
                 └─ flush on size ───────────┘
                 └─ flush on timer (500ms) ──┘
                 └─ flush explicit ──────────┘

How it works:

Points are accumulated in a buffer until CODE_BATCH_SIZE threshold
Intermediate batches use wait=false (fire-and-forget) for speed
Final flush uses wait=true for consistency
Auto-flush timer prevents data from being stuck in buffer

Run the accumulator benchmark to find optimal settings:

QDRANT_URL=http://localhost:6333 \
EMBEDDING_BASE_URL=http://localhost:11434 \
node benchmarks/accumulator-buffer.mjs

Typical Optimal Values

Hardware	EMBEDDING_BATCH_SIZE	CODE_BATCH_SIZE
CPU only	32-64	128-256
GPU 4GB	128-256	256-384
GPU 8GB+	512-1024	512-768
GPU 12GB+	1024-2048	768+

Development

npm run dev          # Development with auto-reload
npm run build        # Production build
npm run type-check   # TypeScript validation
npm test             # Run unit test suite (mocked, fast)
npm run test:coverage # Coverage report
npm run test-integration # Run real integration tests (requires Qdrant + Ollama)

Testing

Unit Tests (Mocked)

864 tests across test files with 97%+ coverage:

Unit Tests: QdrantManager (56), Ollama (41), OpenAI (25), Cohere (29), Voyage (31), Factory (43), Prompts (50), Transport (15), MCP Server (19)
Integration Tests (Mocked): Code indexer (56), scanner (15), chunker (24), synchronizer (42), snapshot (26), merkle tree (28)

CI/CD: GitHub Actions runs build, type-check, and tests on Node.js 22 LTS for every push/PR.

Real Integration Tests

233 tests across 18 modular test suites testing against real Qdrant and Ollama:

# Run all integration tests
npm run test-integration

# Run a specific suite (1-18)
TEST_SUITE=1 npm run test-integration  # Embeddings
TEST_SUITE=18 npm run test-integration # Git Metadata

# Skip cleanup for debugging
SKIP_CLEANUP=1 npm run test-integration

Test Suites:

Embeddings (single, batch, parallel)
Qdrant Operations (CRUD, filters, batch delete)
PointsAccumulator (batch pipeline)
File Indexing Lifecycle
Hash & Snapshot Consistency
Ignore Patterns
Chunk Boundaries & Line Numbers
Multi-Language Support
Ruby AST Chunking (Rails patterns)
Search Accuracy
Edge Cases
Batch Pipeline in CodeIndexer
Concurrent Operations
Parallel File Sync & Sharded Snapshots
Pipeline & WorkerPool
Schema Migration & Delete Optimization
ForceReindex & Parallel Indexing
Git Metadata Integration

Requirements: Running Qdrant (default: http://localhost:6333) and Ollama (default: http://localhost:11434).

Contributing

Contributions welcome! See CONTRIBUTING.md for:

Development workflow
Conventional commit format (feat:, fix:, BREAKING CHANGE:)
Testing requirements (run npm test, npm run type-check, npm run build)

Automated releases: Semantic versioning via conventional commits - feat: → minor, fix: → patch, BREAKING CHANGE: → major.

Acknowledgments

License

MIT - see LICENSE file.