Paparats MCP
Paparats-kvetka — a magical flower from Slavic folklore that blooms on Kupala Night and grants power to whoever finds it. Likewise, paparats-mcp helps you find the right code across a sea of repositories.
Semantic code search for AI coding assistants. Give Claude Code, Cursor, Windsurf, Codex and rest deep understanding of your entire codebase — single repo or multi-project workspaces. Search by meaning, not keywords. Keep your index fresh with real-time file watching. Return only relevant chunks instead of full files to save tokens.
Everything runs locally. No cloud. No API keys. Your code never leaves your machine.
Why Paparats?
AI coding assistants are smart, but they can only see files you open. They don't know your codebase structure, where the authentication logic lives, or how services connect. Paparats fixes that.
What you get
- 🔍 Semantic code search — ask "where is the rate limiting logic?" and get exact code ranked by meaning, not grep matches
- ⚡️ Real-time sync — edit a file, and 2 seconds later it's re-indexed. No manual re-runs
- 🧠 LSP intelligence — go-to-definition, find-references, rename symbols via CCLSP integration
- 💾 Token savings — return only relevant chunks instead of full files to reduce context size
- 🏢 Multi-project workspaces — search across backend, frontend, infra repos in one query
- 🔒 100% local & private — Qdrant vector database + Ollama embeddings. Nothing leaves your laptop
- 🎯 Language-aware chunking — code split by functions/classes, not arbitrary character counts (Ruby, TypeScript, Python, Go, Rust, Java, C/C++, C#, Terraform)
Who benefits
| Use Case | How Paparats Helps |
|---|---|
| Solo developers | Quickly navigate unfamiliar codebases, find examples of patterns, reduce context-switching |
| Multi-repo teams | Cross-project search (backend + frontend + infra), consistent patterns, faster onboarding |
| AI agents | Foundation for product support bots, QA automation, dev assistants — any agent that needs code context |
| Legacy modernization | Find all usages of deprecated APIs, identify migration patterns, discover hidden dependencies |
| Contractors/consultants | Accelerate ramp-up on client codebases, reduce "where is X?" questions |
Quick Start
# 1. Install CLI
npm install -g @paparats/cli
# 2. One-time setup (downloads ~1.6 GB GGUF model, starts Docker containers)
paparats install
# 3. In your project
cd your-project
paparats init # creates .paparats.yml
paparats index # index the codebase
# 4. Keep index fresh with file watching
paparats watch # run in background or separate terminal
# 5. Connect your IDE (Cursor, Claude Code) — see "Connecting MCP" below
Prerequisites
Install these before running paparats install:
| Tool | Purpose | Install |
|---|---|---|
| Docker | Runs Qdrant vector DB + MCP server | docker.com |
| Docker Compose | Orchestrates containers (v2) | Included with Docker Desktop; Linux: apt install docker-compose-plugin |
| Ollama | Local embedding model (on host) | ollama.com |
The CLI checks that docker, ollama, and docker compose are available. If missing, it exits with installation links.
How It Works
Your projects Paparats AI assistant
(Claude Code / Cursor)
backend/ ┌──────────────────────┐
.paparats.yml ────────►│ Indexer │
frontend/ │ - chunks code │ ┌──────────────┐
.paparats.yml ────────►│ - embeds via Ollama │─────────►│ MCP search │
infra/ │ - stores in Qdrant │ │ tool call │
.paparats.yml ────────►│ - watches changes │ └──────────────┘
└──────────────────────┘
- Indexing: Code is chunked at function/class boundaries, embedded via Jina Code Embeddings 1.5B, stored in Qdrant
- Searching: AI assistant queries via MCP → server expands query (handles abbreviations, plurals, case variants) → Qdrant returns top matches → only relevant chunks sent back
- Token savings: Return only relevant chunks instead of loading full files
- Watching: File changes trigger re-indexing of affected files only (unchanged code never re-embedded thanks to content-hash cache)
Key Features
🎯 Better Search Quality
Task-specific embeddings — Jina Code Embeddings supports 3 query types (nl2code, code2code, techqa) with different prefixes for better relevance:
"find authentication middleware"→nl2codeprefix (natural language → code)"function validateUser(req, res)"→code2codeprefix (code → similar code)"how does OAuth work in this app?"→techqaprefix (technical questions)
Query expansion — every search generates 2-3 variations server-side:
- Abbreviations:
auth↔authentication,db↔database - Case variants:
userAuth→user_auth→UserAuth - Plurals:
users→user,dependencies→dependency - Filler removal:
"how does auth work"→"auth"
All variants searched in parallel, results merged by max score.
Confidence tiers — results labeled High (≥60%), Partial (40–60%), Low (<40%) to guide AI next steps.
⚡️ Performance
Embedding cache — SQLite cache with content-hash keys + Float32 vectors. Unchanged code never re-embedded. LRU cleanup at 100k entries.
Language-aware chunking — 4 strategies per language (block-based for Ruby/Python, brace-based for JS/TS/Go/Rust, indent-based, fixed-size fallback). Supports 11 languages .
Real-time watching — paparats watch monitors file changes with debouncing (1s default). Edit → save → re-index in ~2 seconds.
🔗 Integrations
CCLSP (Claude Code LSP) — during paparats init, optionally sets up:
- LSP server for your language (TypeScript, Python, Go, Ruby, etc.)
- MCP config for go-to-definition, find-references, rename
- Typical AI workflow:
search_code(semantic) →find_definition(precise navigation) →find_references(impact analysis)
Skip with --skip-cclsp if not needed.
Comparison with Alternatives
Feature Matrix
| Feature | Paparats | Vexify | SeaGOAT | Augment Context | Sourcegraph | Greptile | Bloop |
|---|---|---|---|---|---|---|---|
| Deployment | |||||||
| Open source | ✅ MIT | ✅ MIT | ✅ MIT | ❌ Proprietary | ⚠️ Partial | ❌ Proprietary | ⚠️ Archived¹ |
| Fully local | ✅ | ✅ | ✅ | ❌ Cloud² | ❌ Cloud | ❌ SaaS | ✅ |
| Search Quality | |||||||
| Code embeddings | ✅ Jina 1.5B³ | ⚠️ Limited⁴ | ❌ MiniLM⁵ | ⚠️ Proprietary | ⚠️ Proprietary | ⚠️ Proprietary | ✅ |
| Vector database | Qdrant | SQLite | ChromaDB | Proprietary | Proprietary | pgvector | Qdrant |
| AST-aware chunking | ✅ 4 strategies | ❌ | ❌ | ⚠️ Unknown | ⚠️ Partial | ⚠️ Unknown | ✅ |
| Query expansion | ✅ 4 types⁶ | ❌ | ❌ | ⚠️ Unknown | ⚠️ Partial | ⚠️ Unknown | ❌ |
| Developer Experience | |||||||
| Real-time file watching | ✅ Auto | ❌ Manual | ❌ Manual | ✅ CI/CD | ✅ | ⚠️ Unknown | ⚠️ |
| Embedding cache | ✅ SQLite | ⚠️ Implicit | ❌ | ⚠️ Unknown | ⚠️ Unknown | ⚠️ Unknown | ❌ |
| Multi-project search | ✅ Groups | ✅ | ❌ Single | ✅ | ✅ | ✅ | ✅ |
| One-command install | ✅ | ⚠️ Manual | pip install | Account + CI | Account | SaaS signup | Build source |
| AI Integration | |||||||
| MCP native | ✅ | ✅ | ❌ | ✅ | ❌ | ⚠️ API | ❌ |
| LSP integration | ✅ CCLSP | ❌ | ❌ | ❌ | ⚠️ Partial | ❌ | ❌ |
| Token savings metrics | ✅ Per-query | ❌ | ❌ | ⚠️ Unknown | ❌ | ❌ | ❌ |
| Pricing | |||||||
| Cost | Free | Free | Free | Paid | Paid | Paid | Archived |
Notes:
- Bloop archived January 2, 2025
- Augment Context Engine indexes locally but stores vectors in cloud
- Jina Code Embeddings 1.5B (1536 dims) with task-specific prefixes (nl2code, code2code, techqa)
- Vexify supports Ollama models but limited to specific embeddings (jina-embeddings-2-base-code, nomic-embed-text)
- SeaGOAT locked to all-MiniLM-L6-v2 (384 dims, general-purpose)
- Abbreviations, case variants, plurals, filler word removal
Why Paparats?
🔒 Privacy-first — Everything runs locally. Augment and Greptile store your code vectors in the cloud, Sourcegraph requires cloud deployment.
🧠 Better embeddings — Jina Code Embeddings 1.5B (1536 dims) trained specifically for code with task-specific prefixes. Vexify uses smaller jina-embeddings-2-base-code; SeaGOAT uses general-purpose MiniLM (384 dims).
⚡️ Production-grade stack — Qdrant handles millions of vectors with sub-100ms latency. SQLite with extensions (Vexify) doesn't scale beyond small projects. ChromaDB (SeaGOAT) is designed for prototyping, not production.
🎯 Smarter search — Query expansion (4 strategies) + task prefix detection (nl2code/code2code/techqa) automatically improve relevance. Competitors don't expose these features.
🔄 True real-time — paparats watch keeps index fresh automatically with 1s debounce. Vexify and SeaGOAT require manual reindex commands. Augment requires CI/CD hooks.
🔗 LSP included — CCLSP integration gives your AI go-to-definition, find-references, rename. No other tool bundles this.
💰 Free forever — No usage limits, credits, or per-seat fees.
📊 Transparent metrics — Every search shows tokens returned vs full-file tokens, savings %, confidence tier. Helps AI decide next steps.
Configuration
.paparats.yml in your project root:
group: 'my-project-group' # required — Qdrant collection name
language: ruby # required — or array: [ruby, typescript]
indexing:
paths: ['app/', 'lib/'] # directories to index (default: ["./"])
exclude: ['vendor/**'] # additional excludes (merged with language defaults)
extensions: ['.rb'] # override auto-detected extensions
chunkSize: 1024 # max chars per chunk (default: 1024)
concurrency: 2 # parallel file processing (default: 2)
batchSize: 50 # Qdrant upsert batch size (default: 50)
watcher:
enabled: true # auto-reindex on file changes (default: true)
debounce: 1000 # ms debounce (default: 1000)
embeddings:
provider: 'ollama' # embedding provider (default: "ollama")
model: 'jina-code-embeddings' # Ollama alias (see below)
dimensions: 1536 # vector dimensions (default: 1536)
Groups
Projects with the same group name share a search scope. All indexed together in one Qdrant collection.
# backend/.paparats.yml
group: 'my-fullstack'
language: ruby
indexing:
paths: ['app/', 'lib/']
# frontend/.paparats.yml
group: 'my-fullstack'
language: typescript
indexing:
paths: ['src/']
Now searching "authentication flow" finds code in both backend and frontend.
Connecting MCP
After paparats install and paparats index, connect your IDE:
Cursor
Create or edit ~/.cursor/mcp.json (global) or .cursor/mcp.json (project):
{
"mcpServers": {
"paparats": {
"type": "http",
"url": "http://localhost:9876/mcp"
}
}
}
Restart Cursor after changing config.
Claude Code
claude mcp add --transport http paparats http://localhost:9876/mcp
Or add to .mcp.json in project root:
{
"mcpServers": {
"paparats": {
"type": "http",
"url": "http://localhost:9876/mcp"
}
}
}
Verify
paparats status— check server is running- In your IDE, look for MCP tools:
search_codeandhealth_check - Ask the AI: "Search for authentication logic in the codebase"
Embedding Model Setup
Default: jinaai/jina-code-embeddings-1.5b-GGUF — code-optimized, 1.5B params, 1536 dims, 32k context. Not in Ollama registry, so we create a local alias.
Recommended: paparats install automates this:
- Downloads GGUF (~1.65 GB) to
~/.paparats/models/ - Creates Modelfile and runs
ollama create jina-code-embeddings - Starts Ollama with
ollama serveif not running
Manual setup:
# 1. Download GGUF
curl -L -o jina-code-embeddings-1.5b-Q8_0.gguf \
"https://huggingface.co/jinaai/jina-code-embeddings-1.5b-GGUF/resolve/main/jina-code-embeddings-1.5b-Q8_0.gguf"
# 2. Create Modelfile
cat > Modelfile <<'EOF'
FROM ./jina-code-embeddings-1.5b-Q8_0.gguf
PARAMETER num_ctx 8192
EOF
# 3. Register in Ollama
ollama create jina-code-embeddings -f Modelfile
# 4. Verify
ollama list | grep jina
| Spec | Value |
|---|---|
| Parameters | 1.5B |
| Dimensions | 1536 |
| Context | 32,768 tokens (recommended ≤ 8,192) |
| Quantization | Q8_0 (~1.6 GB) |
| Languages | 15+ programming languages |
Task-specific prefixes (nl2code, code2code, techqa) applied automatically.
CLI Commands
| Command | Description |
|---|---|
paparats init | Create .paparats.yml (interactive or --non-interactive) |
paparats install | Set up Docker + Ollama model (~1.6 GB download) |
paparats update | Update CLI from npm + pull latest Docker image |
paparats index | Index the current project |
paparats search <query> | Semantic search across indexed projects |
paparats watch | Watch files and auto-reindex on changes |
paparats status | System status (Docker, Ollama, config, server health, groups) |
paparats doctor | Run diagnostic checks |
paparats groups | List all indexed groups and projects |
Most commands support --server <url> (default: http://localhost:9876) and --json for machine-readable output.
Common Options
paparats init
--force— Overwrite existing config--group <name>— Set group (skip prompt)--language <lang>— Set language (skip prompt)--non-interactive— Use defaults without prompts--skip-cclsp— Skip CCLSP language server setup
paparats install
--skip-docker— Skip Docker setup (only set up Ollama)--skip-ollama— Skip Ollama model (only start Docker)-v, --verbose— Show detailed output
paparats index
-f, --force— Force reindex (clear existing chunks)--dry-run— Show what would be indexed--timeout <ms>— Request timeout (default: 300000)-v, --verbose— Show skipped files and errors--json— Output as JSON
paparats search <query>
-n, --limit <n>— Max results (default: 5)-p, --project <name>— Filter by project-g, --group <name>— Override group from config--timeout <ms>— Request timeout (default: 30000)-v, --verbose— Show token savings--json— Output as JSON
paparats watch
--dry-run— Show what would be watched-v, --verbose— Show file events--json— Output events as JSON lines--polling— Use polling instead of native watchers (fewer file descriptors; use if EMFILE occurs)
Use Cases Beyond Coding
Paparats is a foundation for building AI agents that need code context:
🎯 Product Support Bots
- Index product codebase → support bot answers "how do I configure X?" with exact code examples
- Reduces ticket volume, improves response accuracy
🧪 QA Automation
- Index test suites → AI generates new test cases based on existing patterns
- Finds untested code paths by searching for functions without corresponding tests
👨💻 Developer Onboarding
- New hire asks "where is the payment processing logic?" → instant answers
- Reduces ramp-up time from weeks to days
📊 Code Analytics
- Search for anti-patterns: "SQL injection vulnerabilities", "deprecated API usage"
- Find migration candidates: "uses old auth library"
🤖 AI Agent Memory
- Persistent code knowledge for agents that span multiple sessions
- Agent learns codebase structure over time
Architecture
paparats-mcp/
├── packages/
│ ├── server/ # MCP server (Docker image)
│ │ ├── src/
│ │ │ ├── index.ts # HTTP server + MCP handler
│ │ │ ├── indexer.ts # Group-aware indexing
│ │ │ ├── searcher.ts # Search with query expansion + metrics
│ │ │ ├── query-expansion.ts # Abbreviation, case, plural expansion
│ │ │ ├── task-prefixes.ts # Jina task prefix detection
│ │ │ ├── chunker.ts # Language-aware code chunking
│ │ │ ├── embeddings.ts # Ollama provider + SQLite cache
│ │ │ ├── config.ts # .paparats.yml reader
│ │ │ ├── mcp-handler.ts # MCP protocol (SSE + HTTP)
│ │ │ ├── watcher.ts # File watcher (chokidar)
│ │ │ └── types.ts # Shared types
│ │ └── Dockerfile
│ ├── cli/ # CLI tool (npm package)
│ │ └── src/
│ │ ├── index.ts # Commander entry
│ │ └── commands/ # init, install, update, index, etc.
│ └── shared/ # Shared utilities
│ └── src/
│ ├── path-validator.ts # Path validation
│ ├── gitignore-filter.ts # Gitignore parsing
│ └── exclude-patterns.ts # Language-specific excludes
└── examples/
└── paparats.yml.* # Config examples per language
Stack
- Qdrant — vector database (1 collection per group, cosine similarity, payload filtering)
- Ollama — local embeddings via Jina Code Embeddings 1.5B with task-specific prefixes
- MCP — Model Context Protocol (SSE for Cursor, Streamable HTTP for Claude Code)
- TypeScript monorepo with Yarn workspaces
Docker and Ollama
- Qdrant and MCP server run in Docker containers
- Ollama runs on the host (not Docker). Server connects via
host.docker.internal:11434(Mac/Windows). On Linux, setOLLAMA_URL=http://172.17.0.1:11434in~/.paparats/docker-compose.yml - Embedding cache (SQLite) persists in
paparats_cacheDocker volume. Re-indexing unchanged code is instant across restarts
Token Savings Metrics
What we measure (and what we don't)
Paparats provides estimated token savings to help you understand the order of magnitude of context reduction. These are heuristics, not precise measurements.
Per-search response
{
"metrics": {
"tokensReturned": 150, // Actual chunk content length ÷ 4
"estimatedFullFileTokens": 5000, // Heuristic: maxEndLine × 50 ÷ 4
"tokensSaved": 4850, // Difference between estimates
"savingsPercent": 97 // (tokensSaved ÷ estimated) × 100
}
}
| Field | Calculation | Reality Check |
|---|---|---|
tokensReturned | Σ ceil(content.length / 4) | ✅ Based on actual returned content; ÷4 is rough approximation |
estimatedFullFileTokens | Σ ceil(endLine × 50 / 4) | ⚠️ Heuristic: assumes 50 chars/line, never loads actual files |
tokensSaved | estimated - returned | ⚠️ Derived: difference between two estimates |
savingsPercent | (saved / estimated) × 100 | ⚠️ Relative: percentage of heuristic estimate |
Cumulative stats
curl -s http://localhost:9876/api/stats | jq '.usage'
{
"searchCount": 47,
"totalTokensSaved": 152340, // Sum of all tokensSaved estimates
"avgTokensSavedPerSearch": 3241 // totalTokensSaved ÷ searchCount
}
These are sums of estimates, not measured token counts from a real tokenizer.
Why heuristics?
We don't:
- Load full files to compare (defeats the purpose of chunking)
- Run a tokenizer on file content (slow, model-dependent)
- Know the exact file size (only chunk line ranges)
We estimate:
- 50 chars/line — typical for code (comments, whitespace, logic)
- 4 chars/token — rough average for code tokens (OpenAI GPT-3.5/4, Claude)
- File size from line count —
endLine × 50assumes uniform density
These constants work reasonably well across languages, but individual files vary:
- Minified JS: 200+ chars/line → underestimate savings
- Ruby with comments: 30 chars/line → overestimate savings
- Dense C++: 60 chars/line → close to estimate
What the metrics tell you
✅ Order of magnitude — are you returning 100 tokens or 10,000?
✅ Relative benefit — is semantic search better than loading full files? (Yes, typically 50–90% reduction)
✅ Trend over time — is avgTokensSavedPerSearch increasing as your codebase grows?
❌ Exact token count — don't use this for billing or precise LLM context budgeting
❌ Model-specific accuracy — different tokenizers (GPT-4 vs Claude vs Llama) produce different counts
❌ File-level precision — individual file estimates can be off by 20–40%
Real-world validation
To verify actual savings, compare:
Without Paparats:
User: "Find authentication logic"
AI: *loads 5 full files*
Context: 25,000 tokens (measured by your LLM API)
With Paparats:
User: "Find authentication logic"
AI: *uses search_code, gets 5 chunks*
Context: 1,200 tokens (measured by your LLM API)
Savings: ~95% (real)
The metrics are directionally correct but use ÷4 as a proxy, not your LLM's actual tokenizer.
Why we still show them
Even as estimates, token savings metrics are useful:
- AI decision-making — if
savingsPercent < 40%, the AI might decide to use grep or file reading instead - Performance monitoring — track
avgTokensSavedPerSearchover time to see if chunking strategies need tuning - User feedback — "search saved ~10k tokens" gives intuition about the benefit
If you need exact counts, instrument your LLM API calls and compare before/after adding Paparats.
Honest comparison
Most code search tools don't provide any metrics. When they do:
- Sourcegraph — no token metrics, only "results found"
- Greptile — API response sizes, not token estimates
- Vexify — no metrics
- SeaGOAT — no metrics
Paparats shows rough estimates to give you visibility into context reduction, even if imperfect. Use them as indicators, not ground truth.
License
MIT
Releasing (maintainers)
- Commit all changes, then bump and commit version:
yarn release patch(orminor/major). This only syncs version and commits — no tag, no push. - Publish to npm:
npm login(if needed), thenyarn publish:npm. The MCP registry requires the package to exist on npm before it accepts the publish. - Tag and push:
yarn release:push. This creates the tag and pushes; docker-publish.yml and publish-mcp.yml run and will succeed because npm already has the version.
Contributing
Contributions welcome! Areas of interest:
- Additional language support (PHP, Elixir, Scala, Kotlin, Swift)
- Alternative embedding providers (OpenAI, Cohere, local GGUF via llama.cpp)
- Performance optimizations (chunking strategies, cache eviction)
- Agent use cases (support bots, QA automation, code analytics)
See CONTRIBUTING.md for guidelines.
Links
- Jina Code Embeddings — embedding model
- CCLSP — LSP integration for MCP
- Qdrant — vector database
- Ollama — local LLM runtime
- MCP — Model Context Protocol
Star the repo if Paparats helps you code faster! ⭐️