MCP Hub
Back to servers

Shebe

A high-performance BM25-based code search engine for MCP that provides sub-2ms query latency and efficient indexing for large codebases without requiring embeddings or GPUs.

Stars
6
Forks
1
Tools
10
Updated
Jan 1, 2026
Validated
Jan 11, 2026

Shebe

Simple RAG Service for Code Search

Fast BM25 full-text search for code repositories with MCP integration for Claude Code.

Table of Contents


Quick Start

See INSTALLATION.md.


What is Shebe?

Shebe provides content search for code - find functions, APIs and patterns across large codebases using keyword search.

Key Features:

  • 2ms query latency
  • 2k-12k files/sec indexing (6k files in 0.5s)
  • 200-700 tokens/query
  • BM25 only - no embeddings or GPU
  • Full UTF-8 support (emoji, CJK, special characters)
  • 14 MCP tools for Claude Code (reference)

Positioning: Complements structural tools (Serena MCP) with content search.


Why Shebe?

When using AI coding assistants to refactor symbols across large codebases (6k+ files), developers face a binary choice: precision (LSP-based tools) or efficiency (grep/ripgrep). Shebe attempts to eliminate this trade-off.

Benchmark: Refactoring AuthorizationPolicy across Istio (~6k files)

ApproachSearchesTimeTokens
Shebe find_references12-3s~4,500
Claude + Grep1315-20s~12,000
Claude + Serena MCP825-30s~18,000

Shebe provides 6-10x faster end-to-end time and 3-4x fewer tokens by returning confidence-scored, pattern-classified results in a single call.

See WHY_SHEBE.md for detailed benchmarks and tool comparisons.

Quick Comparison

CapabilityShebegrep/ripgrepSerena MCP
Ranked results (BM25)YesNoNo
Confidence scoringYesNoNo
Non-code files (YAML, md)YesYesNo
Token efficiency200-7002,000-8,0001,000-3,000
Speed (5k+ files)2-32ms100-1000ms500-5000ms

Common Tasks

Quick links to accomplish specific goals:

TaskToolGuide
Rename a symbol safelyfind_referencesReference
Search polyglot codebasesearch_codeReference
Explore unfamiliar repoindex_repository + search_codeQuick Start
Find files by patternfind_fileReference
View file with contextread_file or preview_chunkReference
Update stale indexreindex_sessionReference

Tool Selection Guide

Content Search (Use Shebe)

Best for finding code by keywords, patterns and text content:

  • "Find all usages of authenticate"
  • "Where is rate limiting implemented?"
  • "Show me error handling patterns"
  • "Find configuration for database connections"

Structural Navigation (Use Serena/LSP)

Best for precise symbol operations and type information:

  • "Go to definition of UserService"
  • "Find all implementations of Handler trait"
  • "Rename oldFunc to newFunc across codebase"
  • "Show type hierarchy for this class"

Simple Pattern Matching (Use grep/ripgrep)

Best for exact string matches in small codebases:

  • "Find exact string TODO:"
  • "Count occurrences of deprecated"
  • "Quick one-off search in <1,000 files"

External Information (Use Web Search)

Best for documentation and community knowledge:

  • "Latest React 19 migration guide"
  • "Community solutions for specific errors"
  • "Blog posts about architectural patterns"

Shebe + Serena Together

For complete codebase exploration without token waste:

1. Shebe: "Find usages of authenticate" -> discover files (2ms, 300 tokens)
2. Serena: "Go to definition" -> navigate to implementation (precise)
3. Shebe: "Find similar patterns" -> discover related code (2ms, 300 tokens)

Configuration

Quick Reference

VariableDefaultDescription
SHEBE_INDEX_DIR~/.local/state/shebeSession storage location
SHEBE_CHUNK_SIZE512Characters per chunk (100-2000)
SHEBE_OVERLAP64Overlap between chunks
SHEBE_DEFAULT_K10Default search results count
SHEBE_MAX_K100Maximum search results allowed

Configuration File

Create shebe.toml in your working directory or ~/.config/shebe/shebe.toml:

[indexing]
chunk_size = 512
overlap = 64
max_file_size = 10485760  # 10MB

[search]
default_k = 10
max_k = 100

See CONFIGURATION.md for complete reference.


Documentation

Getting Started

Reference

Development


Performance

Validated on Istio (5,605 files, Go-heavy) and OpenEMR (6,364 files, PHP polyglot):

MetricResult
Query latency2ms (consistent across all query types)
Indexing (Istio)11,210 files/sec (0.5s for 5,605 files)
Indexing (OpenEMR)1,928 files/sec (3.3s for 6,364 files)
Token usage210-650 tokens/query
Polyglot coverage11 file types in single query

See docs/Performance.md for detailed benchmarks.


Architecture

MCP-Only Design

Shebe is accessed exclusively via the MCP protocol, designed for Claude Code integration. No HTTP server required.

System Design

                    +------------------+
                    |   Claude Code    |
                    +--------+---------+
                             | MCP (stdio)
                    +--------v---------+
                    |   shebe-mcp      |
                    |   (14 tools)     |
                    +--------+---------+
                             |
                    +--------v---------+
                    |  Shared Storage  |
                    | ~/.local/state/  |
                    |  shebe/sessions/ |
                    +------------------+

See ARCHITECTURE.md for developer guide.


Troubleshooting

IssueCauseSolution
"Session not found"Session doesn't exist or typoRun list_sessions to see available sessions
"Schema version mismatch"Session from older Shebe versionRun upgrade_session to migrate
Slow indexingDisk I/O or large filesExclude node_modules/, target/, check disk
No search resultsEmpty session or wrong queryVerify with get_session_info, check query syntax
"File not found" in read_fileFile deleted since indexingRun reindex_session to update
High token usageToo many resultsReduce k parameter (default: 10)

For detailed troubleshooting, see docs/guides/mcp-setup-guide.md.


Project Status

Version: 0.6.0 Status: Production Ready - MCP-Only Architecture (14 Tools) Testing: 397 tests (86.76% coverage) + 30 performance scenarios (100% pass rate) Next: Stage 3 (CI/CD Pipeline)

See CHANGELOG.md for version history.


License

See LICENSE.


Contributing

We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.

Quick checklist:

  1. Read ARCHITECTURE.md for codebase guide
  2. All 397 tests must pass (make test)
  3. Zero clippy warnings (make clippy)
  4. Max 120 char line length
  5. Maintain >85% test coverage (currently 86.76%)
  6. Single commit per feature branch

See CODE_OF_CONDUCT.md for community guidelines.

Reviews

No reviews yet

Sign in to write a review