MCP Hub
Back to servers

Paparats

Validation Failed

Semantic code search for AI coding assistants. Local Qdrant, multi-repo, no API keys.

Stars
5
Updated
Feb 10, 2026
Validated
Feb 12, 2026

Validation Error:

Process exited with code 1. stderr: Usage: paparats [options] [command] Semantic code search for your workspace Options: -V, --version output the version number -h, --help display help for command Commands: init [options] Create .paparats.yml in the current directory install [options] Set up Docker containers (Qdrant + MCP server) and Ollama embedding model update [options] Update CLI from npm and pull/restart latest server

Quick Install

npx -y @paparats/cli

Paparats MCP

Paparats-kvetka (fern flower)

Paparats-kvetka — a magical flower from Slavic folklore that blooms on Kupala Night and grants power to whoever finds it. Likewise, paparats-mcp helps you find the right code across a sea of repositories.

Semantic code search for AI coding assistants. Give Claude Code, Cursor, Windsurf, Codex and rest deep understanding of your entire codebase — single repo or multi-project workspaces. Search by meaning, not keywords. Keep your index fresh with real-time file watching. Return only relevant chunks instead of full files to save tokens.

Everything runs locally. No cloud. No API keys. Your code never leaves your machine.


Why Paparats?

AI coding assistants are smart, but they can only see files you open. They don't know your codebase structure, where the authentication logic lives, or how services connect. Paparats fixes that.

What you get

  • 🔍 Semantic code search — ask "where is the rate limiting logic?" and get exact code ranked by meaning, not grep matches
  • ⚡️ Real-time sync — edit a file, and 2 seconds later it's re-indexed. No manual re-runs
  • 🧠 LSP intelligence — go-to-definition, find-references, rename symbols via CCLSP integration
  • 💾 Token savings — return only relevant chunks instead of full files to reduce context size
  • 🏢 Multi-project workspaces — search across backend, frontend, infra repos in one query
  • 🔒 100% local & private — Qdrant vector database + Ollama embeddings. Nothing leaves your laptop
  • 🎯 Language-aware chunking — code split by functions/classes, not arbitrary character counts (Ruby, TypeScript, Python, Go, Rust, Java, C/C++, C#, Terraform)

Who benefits

Use CaseHow Paparats Helps
Solo developersQuickly navigate unfamiliar codebases, find examples of patterns, reduce context-switching
Multi-repo teamsCross-project search (backend + frontend + infra), consistent patterns, faster onboarding
AI agentsFoundation for product support bots, QA automation, dev assistants — any agent that needs code context
Legacy modernizationFind all usages of deprecated APIs, identify migration patterns, discover hidden dependencies
Contractors/consultantsAccelerate ramp-up on client codebases, reduce "where is X?" questions

Quick Start

# 1. Install CLI
npm install -g @paparats/cli

# 2. One-time setup (downloads ~1.6 GB GGUF model, starts Docker containers)
paparats install

# 3. In your project
cd your-project
paparats init   # creates .paparats.yml
paparats index  # index the codebase

# 4. Keep index fresh with file watching
paparats watch  # run in background or separate terminal

# 5. Connect your IDE (Cursor, Claude Code) — see "Connecting MCP" below

Prerequisites

Install these before running paparats install:

ToolPurposeInstall
DockerRuns Qdrant vector DB + MCP serverdocker.com
Docker ComposeOrchestrates containers (v2)Included with Docker Desktop; Linux: apt install docker-compose-plugin
OllamaLocal embedding model (on host)ollama.com

The CLI checks that docker, ollama, and docker compose are available. If missing, it exits with installation links.


How It Works

Your projects                   Paparats                       AI assistant
                                                               (Claude Code / Cursor)
  backend/                 ┌──────────────────────┐
    .paparats.yml ────────►│  Indexer              │
  frontend/                │   - chunks code       │          ┌──────────────┐
    .paparats.yml ────────►│   - embeds via Ollama │─────────►│ MCP search   │
  infra/                   │   - stores in Qdrant  │          │ tool call    │
    .paparats.yml ────────►│   - watches changes   │          └──────────────┘
                           └──────────────────────┘
  1. Indexing: Code is chunked at function/class boundaries, embedded via Jina Code Embeddings 1.5B, stored in Qdrant
  2. Searching: AI assistant queries via MCP → server expands query (handles abbreviations, plurals, case variants) → Qdrant returns top matches → only relevant chunks sent back
  3. Token savings: Return only relevant chunks instead of loading full files
  4. Watching: File changes trigger re-indexing of affected files only (unchanged code never re-embedded thanks to content-hash cache)

Key Features

🎯 Better Search Quality

Task-specific embeddings — Jina Code Embeddings supports 3 query types (nl2code, code2code, techqa) with different prefixes for better relevance:

  • "find authentication middleware"nl2code prefix (natural language → code)
  • "function validateUser(req, res)"code2code prefix (code → similar code)
  • "how does OAuth work in this app?"techqa prefix (technical questions)

Query expansion — every search generates 2-3 variations server-side:

  • Abbreviations: authauthentication, dbdatabase
  • Case variants: userAuthuser_authUserAuth
  • Plurals: usersuser, dependenciesdependency
  • Filler removal: "how does auth work""auth"

All variants searched in parallel, results merged by max score.

Confidence tiers — results labeled High (≥60%), Partial (40–60%), Low (<40%) to guide AI next steps.

⚡️ Performance

Embedding cache — SQLite cache with content-hash keys + Float32 vectors. Unchanged code never re-embedded. LRU cleanup at 100k entries.

Language-aware chunking — 4 strategies per language (block-based for Ruby/Python, brace-based for JS/TS/Go/Rust, indent-based, fixed-size fallback). Supports 11 languages .

Real-time watchingpaparats watch monitors file changes with debouncing (1s default). Edit → save → re-index in ~2 seconds.

🔗 Integrations

CCLSP (Claude Code LSP) — during paparats init, optionally sets up:

  • LSP server for your language (TypeScript, Python, Go, Ruby, etc.)
  • MCP config for go-to-definition, find-references, rename
  • Typical AI workflow: search_code (semantic) → find_definition (precise navigation) → find_references (impact analysis)

Skip with --skip-cclsp if not needed.


Comparison with Alternatives

Feature Matrix

FeaturePaparatsVexifySeaGOATAugment ContextSourcegraphGreptileBloop
Deployment
Open source✅ MIT✅ MIT✅ MIT❌ Proprietary⚠️ Partial❌ Proprietary⚠️ Archived¹
Fully local❌ Cloud²❌ Cloud❌ SaaS
Search Quality
Code embeddings✅ Jina 1.5B³⚠️ Limited⁴❌ MiniLM⁵⚠️ Proprietary⚠️ Proprietary⚠️ Proprietary
Vector databaseQdrantSQLiteChromaDBProprietaryProprietarypgvectorQdrant
AST-aware chunking✅ 4 strategies⚠️ Unknown⚠️ Partial⚠️ Unknown
Query expansion✅ 4 types⁶⚠️ Unknown⚠️ Partial⚠️ Unknown
Developer Experience
Real-time file watching✅ Auto❌ Manual❌ Manual✅ CI/CD⚠️ Unknown⚠️
Embedding cache✅ SQLite⚠️ Implicit⚠️ Unknown⚠️ Unknown⚠️ Unknown
Multi-project search✅ Groups❌ Single
One-command install⚠️ Manualpip installAccount + CIAccountSaaS signupBuild source
AI Integration
MCP native⚠️ API
LSP integration✅ CCLSP⚠️ Partial
Token savings metrics✅ Per-query⚠️ Unknown
Pricing
CostFreeFreeFreePaidPaidPaidArchived

Notes:

  1. Bloop archived January 2, 2025
  2. Augment Context Engine indexes locally but stores vectors in cloud
  3. Jina Code Embeddings 1.5B (1536 dims) with task-specific prefixes (nl2code, code2code, techqa)
  4. Vexify supports Ollama models but limited to specific embeddings (jina-embeddings-2-base-code, nomic-embed-text)
  5. SeaGOAT locked to all-MiniLM-L6-v2 (384 dims, general-purpose)
  6. Abbreviations, case variants, plurals, filler word removal

Why Paparats?

🔒 Privacy-first — Everything runs locally. Augment and Greptile store your code vectors in the cloud, Sourcegraph requires cloud deployment.

🧠 Better embeddingsJina Code Embeddings 1.5B (1536 dims) trained specifically for code with task-specific prefixes. Vexify uses smaller jina-embeddings-2-base-code; SeaGOAT uses general-purpose MiniLM (384 dims).

⚡️ Production-grade stack — Qdrant handles millions of vectors with sub-100ms latency. SQLite with extensions (Vexify) doesn't scale beyond small projects. ChromaDB (SeaGOAT) is designed for prototyping, not production.

🎯 Smarter search — Query expansion (4 strategies) + task prefix detection (nl2code/code2code/techqa) automatically improve relevance. Competitors don't expose these features.

🔄 True real-timepaparats watch keeps index fresh automatically with 1s debounce. Vexify and SeaGOAT require manual reindex commands. Augment requires CI/CD hooks.

🔗 LSP included — CCLSP integration gives your AI go-to-definition, find-references, rename. No other tool bundles this.

💰 Free forever — No usage limits, credits, or per-seat fees.

📊 Transparent metrics — Every search shows tokens returned vs full-file tokens, savings %, confidence tier. Helps AI decide next steps.



Configuration

.paparats.yml in your project root:

group: 'my-project-group' # required — Qdrant collection name
language: ruby # required — or array: [ruby, typescript]

indexing:
  paths: ['app/', 'lib/'] # directories to index (default: ["./"])
  exclude: ['vendor/**'] # additional excludes (merged with language defaults)
  extensions: ['.rb'] # override auto-detected extensions
  chunkSize: 1024 # max chars per chunk (default: 1024)
  concurrency: 2 # parallel file processing (default: 2)
  batchSize: 50 # Qdrant upsert batch size (default: 50)

watcher:
  enabled: true # auto-reindex on file changes (default: true)
  debounce: 1000 # ms debounce (default: 1000)

embeddings:
  provider: 'ollama' # embedding provider (default: "ollama")
  model: 'jina-code-embeddings' # Ollama alias (see below)
  dimensions: 1536 # vector dimensions (default: 1536)

Groups

Projects with the same group name share a search scope. All indexed together in one Qdrant collection.

# backend/.paparats.yml
group: 'my-fullstack'
language: ruby
indexing:
  paths: ['app/', 'lib/']
# frontend/.paparats.yml
group: 'my-fullstack'
language: typescript
indexing:
  paths: ['src/']

Now searching "authentication flow" finds code in both backend and frontend.


Connecting MCP

After paparats install and paparats index, connect your IDE:

Cursor

Create or edit ~/.cursor/mcp.json (global) or .cursor/mcp.json (project):

{
  "mcpServers": {
    "paparats": {
      "type": "http",
      "url": "http://localhost:9876/mcp"
    }
  }
}

Restart Cursor after changing config.

Claude Code

claude mcp add --transport http paparats http://localhost:9876/mcp

Or add to .mcp.json in project root:

{
  "mcpServers": {
    "paparats": {
      "type": "http",
      "url": "http://localhost:9876/mcp"
    }
  }
}

Verify

  • paparats status — check server is running
  • In your IDE, look for MCP tools: search_code and health_check
  • Ask the AI: "Search for authentication logic in the codebase"

Embedding Model Setup

Default: jinaai/jina-code-embeddings-1.5b-GGUF — code-optimized, 1.5B params, 1536 dims, 32k context. Not in Ollama registry, so we create a local alias.

Recommended: paparats install automates this:

  • Downloads GGUF (~1.65 GB) to ~/.paparats/models/
  • Creates Modelfile and runs ollama create jina-code-embeddings
  • Starts Ollama with ollama serve if not running

Manual setup:

# 1. Download GGUF
curl -L -o jina-code-embeddings-1.5b-Q8_0.gguf \
  "https://huggingface.co/jinaai/jina-code-embeddings-1.5b-GGUF/resolve/main/jina-code-embeddings-1.5b-Q8_0.gguf"

# 2. Create Modelfile
cat > Modelfile <<'EOF'
FROM ./jina-code-embeddings-1.5b-Q8_0.gguf
PARAMETER num_ctx 8192
EOF

# 3. Register in Ollama
ollama create jina-code-embeddings -f Modelfile

# 4. Verify
ollama list | grep jina
SpecValue
Parameters1.5B
Dimensions1536
Context32,768 tokens (recommended ≤ 8,192)
QuantizationQ8_0 (~1.6 GB)
Languages15+ programming languages

Task-specific prefixes (nl2code, code2code, techqa) applied automatically.


CLI Commands

CommandDescription
paparats initCreate .paparats.yml (interactive or --non-interactive)
paparats installSet up Docker + Ollama model (~1.6 GB download)
paparats updateUpdate CLI from npm + pull latest Docker image
paparats indexIndex the current project
paparats search <query>Semantic search across indexed projects
paparats watchWatch files and auto-reindex on changes
paparats statusSystem status (Docker, Ollama, config, server health, groups)
paparats doctorRun diagnostic checks
paparats groupsList all indexed groups and projects

Most commands support --server <url> (default: http://localhost:9876) and --json for machine-readable output.

Common Options

paparats init

  • --force — Overwrite existing config
  • --group <name> — Set group (skip prompt)
  • --language <lang> — Set language (skip prompt)
  • --non-interactive — Use defaults without prompts
  • --skip-cclsp — Skip CCLSP language server setup

paparats install

  • --skip-docker — Skip Docker setup (only set up Ollama)
  • --skip-ollama — Skip Ollama model (only start Docker)
  • -v, --verbose — Show detailed output

paparats index

  • -f, --force — Force reindex (clear existing chunks)
  • --dry-run — Show what would be indexed
  • --timeout <ms> — Request timeout (default: 300000)
  • -v, --verbose — Show skipped files and errors
  • --json — Output as JSON

paparats search <query>

  • -n, --limit <n> — Max results (default: 5)
  • -p, --project <name> — Filter by project
  • -g, --group <name> — Override group from config
  • --timeout <ms> — Request timeout (default: 30000)
  • -v, --verbose — Show token savings
  • --json — Output as JSON

paparats watch

  • --dry-run — Show what would be watched
  • -v, --verbose — Show file events
  • --json — Output events as JSON lines
  • --polling — Use polling instead of native watchers (fewer file descriptors; use if EMFILE occurs)

Use Cases Beyond Coding

Paparats is a foundation for building AI agents that need code context:

🎯 Product Support Bots

  • Index product codebase → support bot answers "how do I configure X?" with exact code examples
  • Reduces ticket volume, improves response accuracy

🧪 QA Automation

  • Index test suites → AI generates new test cases based on existing patterns
  • Finds untested code paths by searching for functions without corresponding tests

👨‍💻 Developer Onboarding

  • New hire asks "where is the payment processing logic?" → instant answers
  • Reduces ramp-up time from weeks to days

📊 Code Analytics

  • Search for anti-patterns: "SQL injection vulnerabilities", "deprecated API usage"
  • Find migration candidates: "uses old auth library"

🤖 AI Agent Memory

  • Persistent code knowledge for agents that span multiple sessions
  • Agent learns codebase structure over time

Architecture

paparats-mcp/
├── packages/
│   ├── server/          # MCP server (Docker image)
│   │   ├── src/
│   │   │   ├── index.ts           # HTTP server + MCP handler
│   │   │   ├── indexer.ts         # Group-aware indexing
│   │   │   ├── searcher.ts        # Search with query expansion + metrics
│   │   │   ├── query-expansion.ts # Abbreviation, case, plural expansion
│   │   │   ├── task-prefixes.ts   # Jina task prefix detection
│   │   │   ├── chunker.ts         # Language-aware code chunking
│   │   │   ├── embeddings.ts      # Ollama provider + SQLite cache
│   │   │   ├── config.ts          # .paparats.yml reader
│   │   │   ├── mcp-handler.ts     # MCP protocol (SSE + HTTP)
│   │   │   ├── watcher.ts         # File watcher (chokidar)
│   │   │   └── types.ts           # Shared types
│   │   └── Dockerfile
│   ├── cli/             # CLI tool (npm package)
│   │   └── src/
│   │       ├── index.ts        # Commander entry
│   │       └── commands/       # init, install, update, index, etc.
│   └── shared/          # Shared utilities
│       └── src/
│           ├── path-validator.ts   # Path validation
│           ├── gitignore-filter.ts # Gitignore parsing
│           └── exclude-patterns.ts # Language-specific excludes
└── examples/
    └── paparats.yml.*   # Config examples per language

Stack

  • Qdrant — vector database (1 collection per group, cosine similarity, payload filtering)
  • Ollama — local embeddings via Jina Code Embeddings 1.5B with task-specific prefixes
  • MCP — Model Context Protocol (SSE for Cursor, Streamable HTTP for Claude Code)
  • TypeScript monorepo with Yarn workspaces

Docker and Ollama

  • Qdrant and MCP server run in Docker containers
  • Ollama runs on the host (not Docker). Server connects via host.docker.internal:11434 (Mac/Windows). On Linux, set OLLAMA_URL=http://172.17.0.1:11434 in ~/.paparats/docker-compose.yml
  • Embedding cache (SQLite) persists in paparats_cache Docker volume. Re-indexing unchanged code is instant across restarts

Token Savings Metrics

What we measure (and what we don't)

Paparats provides estimated token savings to help you understand the order of magnitude of context reduction. These are heuristics, not precise measurements.

Per-search response

{
  "metrics": {
    "tokensReturned": 150, // Actual chunk content length ÷ 4
    "estimatedFullFileTokens": 5000, // Heuristic: maxEndLine × 50 ÷ 4
    "tokensSaved": 4850, // Difference between estimates
    "savingsPercent": 97 // (tokensSaved ÷ estimated) × 100
  }
}
FieldCalculationReality Check
tokensReturnedΣ ceil(content.length / 4)✅ Based on actual returned content; ÷4 is rough approximation
estimatedFullFileTokensΣ ceil(endLine × 50 / 4)⚠️ Heuristic: assumes 50 chars/line, never loads actual files
tokensSavedestimated - returned⚠️ Derived: difference between two estimates
savingsPercent(saved / estimated) × 100⚠️ Relative: percentage of heuristic estimate

Cumulative stats

curl -s http://localhost:9876/api/stats | jq '.usage'
{
  "searchCount": 47,
  "totalTokensSaved": 152340, // Sum of all tokensSaved estimates
  "avgTokensSavedPerSearch": 3241 // totalTokensSaved ÷ searchCount
}

These are sums of estimates, not measured token counts from a real tokenizer.


Why heuristics?

We don't:

  • Load full files to compare (defeats the purpose of chunking)
  • Run a tokenizer on file content (slow, model-dependent)
  • Know the exact file size (only chunk line ranges)

We estimate:

  • 50 chars/line — typical for code (comments, whitespace, logic)
  • 4 chars/token — rough average for code tokens (OpenAI GPT-3.5/4, Claude)
  • File size from line countendLine × 50 assumes uniform density

These constants work reasonably well across languages, but individual files vary:

  • Minified JS: 200+ chars/line → underestimate savings
  • Ruby with comments: 30 chars/line → overestimate savings
  • Dense C++: 60 chars/line → close to estimate

What the metrics tell you

Order of magnitude — are you returning 100 tokens or 10,000?
Relative benefit — is semantic search better than loading full files? (Yes, typically 50–90% reduction)
Trend over time — is avgTokensSavedPerSearch increasing as your codebase grows?

Exact token count — don't use this for billing or precise LLM context budgeting
Model-specific accuracy — different tokenizers (GPT-4 vs Claude vs Llama) produce different counts
File-level precision — individual file estimates can be off by 20–40%


Real-world validation

To verify actual savings, compare:

Without Paparats:

User: "Find authentication logic"
AI: *loads 5 full files*
Context: 25,000 tokens (measured by your LLM API)

With Paparats:

User: "Find authentication logic"
AI: *uses search_code, gets 5 chunks*
Context: 1,200 tokens (measured by your LLM API)
Savings: ~95% (real)

The metrics are directionally correct but use ÷4 as a proxy, not your LLM's actual tokenizer.


Why we still show them

Even as estimates, token savings metrics are useful:

  1. AI decision-making — if savingsPercent < 40%, the AI might decide to use grep or file reading instead
  2. Performance monitoring — track avgTokensSavedPerSearch over time to see if chunking strategies need tuning
  3. User feedback — "search saved ~10k tokens" gives intuition about the benefit

If you need exact counts, instrument your LLM API calls and compare before/after adding Paparats.


Honest comparison

Most code search tools don't provide any metrics. When they do:

  • Sourcegraph — no token metrics, only "results found"
  • Greptile — API response sizes, not token estimates
  • Vexify — no metrics
  • SeaGOAT — no metrics

Paparats shows rough estimates to give you visibility into context reduction, even if imperfect. Use them as indicators, not ground truth.


License

MIT


Releasing (maintainers)

  1. Commit all changes, then bump and commit version: yarn release patch (or minor/major). This only syncs version and commits — no tag, no push.
  2. Publish to npm: npm login (if needed), then yarn publish:npm. The MCP registry requires the package to exist on npm before it accepts the publish.
  3. Tag and push: yarn release:push. This creates the tag and pushes; docker-publish.yml and publish-mcp.yml run and will succeed because npm already has the version.

Contributing

Contributions welcome! Areas of interest:

  • Additional language support (PHP, Elixir, Scala, Kotlin, Swift)
  • Alternative embedding providers (OpenAI, Cohere, local GGUF via llama.cpp)
  • Performance optimizations (chunking strategies, cache eviction)
  • Agent use cases (support bots, QA automation, code analytics)

See CONTRIBUTING.md for guidelines.


Links


Star the repo if Paparats helps you code faster! ⭐️

Reviews

No reviews yet

Sign in to write a review