Codemesh
Intelligent code knowledge graph for AI coding agents
71% cheaper, 72% faster, 82% fewer tool calls vs baseline Grep+Read
on 6 real-world repos (Sonnet 4.6) — from a single codemesh index.
Benchmarks · Quick Start · Integrations · Write-Back · How It Works · API Reference · Full Results
The Problem
AI coding agents waste 40-80% of their tokens on discovery — grepping through files, reading irrelevant code, and rebuilding context they've already seen in previous sessions.
On a 600-file codebase, a typical exploration task involves 10+ file reads before the agent even knows what's relevant.
Before: Agent → Grep → 50 matches → Read 10 files → Understand → Work
After: Agent → codemesh_explore → 3 relevant files → codemesh_trace → full path → Work
Codemesh is an MCP server that gives agents a persistent, queryable knowledge graph. The graph gets smarter over time: agents write back what they learn, so the next session starts informed.
Benchmarks
Benchmarked on 6 real-world codebases (Alamofire, Excalidraw, VS Code, Swift Compiler, pydantic-validators, pydantic-basemodel) with Claude Sonnet 4.6, compared alongside baseline and graph-based approaches for context.
Full methodology, per-repo breakdowns, and pairwise comparisons: docs/benchmark-results.md | Early pydantic evals
Cost
| Mode | Alamofire | Excalidraw | VS Code | Swift Compiler1 | pydantic-validators | pydantic-basemodel | Avg |
|---|---|---|---|---|---|---|---|
| Baseline | $0.54 | $0.89 | $0.21 | $0.83 | $1.32 | $0.78 | $0.76 |
| Codemesh MCP | $0.25 | $0.21 | $0.16 | $0.23 | $0.33 | $0.13 | $0.22 |
| Codemesh CLI | $0.67 | $0.51 | $0.16 | $0.83 | $1.00 | $0.18 | $0.56 |
| Codegraph | $0.37 | $0.56 | $0.57 | $0.74 | $0.29 | $0.19 | $0.45 |
Time
| Mode | Alamofire | Excalidraw | VS Code | Swift1 | pydantic-v | pydantic-b | Avg |
|---|---|---|---|---|---|---|---|
| Baseline | 180s | 191s | 87s | 199s | 352s | 232s | 207s |
| Codemesh MCP | 78s | 45s | 35s | 87s | 72s | 32s | 58s |
| Codemesh CLI | 226s | 177s | 62s | 227s | 235s | 51s | 163s |
| Codegraph | 134s | 180s | 192s | 199s | 75s | 60s | 140s |
Tool calls (agent turns)
| Mode | Alamofire | Excalidraw | VS Code | Swift1 | pydantic-v | pydantic-b | Avg |
|---|---|---|---|---|---|---|---|
| Baseline | 31 | 48 | 12 | 29 | 84 | 65 | 45 |
| Codemesh MCP | 9 | 5 | 3 | 14 | 14 | 3 | 8 |
| Codemesh CLI | 30 | 32 | 12 | 56 | 64 | 9 | 34 |
| Codegraph | 31 | 35 | 44 | 44 | 20 | 12 | 31 |
Quality (1–10, LLM-as-judge)
| Mode | Alamofire2 | Excalidraw | VS Code | Swift Compiler | pydantic-validators | pydantic-basemodel | Avg |
|---|---|---|---|---|---|---|---|
| Baseline | n/a | 9 | 8 | 7 | 2 | 9 | 7.0 |
| Codemesh MCP | 9 | 9 | 7 | 8 | 7 | 7.8 | 7.9 |
| Codemesh CLI | 9 | 7 | 7 | 9 | 1 | 8.4 | 6.9 |
| Codegraph | 8 | 9 | 8.7 | 8 | 8 | 9 | 8.4 |
Cost savings: Codemesh MCP vs Baseline
| Repo | Baseline | Codemesh MCP | Cost saved | Time saved |
|---|---|---|---|---|
| Alamofire | $0.54 | $0.25 | −54% | −57% (180s → 78s) |
| Excalidraw | $0.89 | $0.21 | −76% | −76% (191s → 45s) |
| VS Code | $0.21 | $0.16 | −24% | −60% (87s → 35s) |
| Swift Compiler1 | $0.83 | $0.23 | −72% | −56% (199s → 87s) |
| pydantic-validators | $1.32 | $0.33 | −75% | −79% (352s → 72s) |
| pydantic-basemodel | $0.78 | $0.13 | −83% | −86% (232s → 32s) |
| Average | $0.76 | $0.22 | −71% | −72% |
[!NOTE] Codemesh MCP achieves the lowest cost and fastest time of any mode tested — 71% cheaper and 72% faster than baseline on average across 6 repos, using 82% fewer tool calls (8 vs 45). Quality is comparable to baseline (7.9 vs 7.0); Codegraph edges Codemesh on quality (8.4) but at roughly double the cost ($0.45 vs $0.22). Every repo shows cost and time savings — including the comprehension-heavy queries (Excalidraw, pydantic-basemodel) that regressed in prior builds of codemesh.
Quick Start
1. Install
npm install -g @pyalwin/codemesh
Or run directly without installing:
npx -y @pyalwin/codemesh --help
Build from source
git clone https://github.com/pyalwin/codemesh.git
cd codemesh
npm install && npm run build
npm link
Verify the install:
codemesh --versionshould print the package version.
2. Index your project
cd /your/project
codemesh index --with-embeddings
Indexed 656 files
Symbols found: 16733
Edges created: 33266
Duration: 10009ms
PageRank: 13843 nodes scored
Embeddings: 13187 symbols embedded
3. Choose your mode
Codemesh offers two ways to integrate with AI agents:
Option A: MCP Server (structured tool calls)
Add to your Claude Code MCP config (~/.claude/mcp-servers.json or project .mcp.json):
{
"mcpServers": {
"codemesh": {
"command": "npx",
"args": ["-y", "@pyalwin/codemesh"],
"env": {
"CODEMESH_PROJECT_ROOT": "/path/to/your/project"
}
}
}
}
The agent gets native MCP tools:
codemesh_answer— one-call question answering (PRIMARY)codemesh_explore— search, context (multi-target), impactcodemesh_trace— follow call chainscodemesh_enrich/codemesh_workflow— write backcodemesh_status— health check
Best for: Opus, structured workflows, enrichment/write-back
Option B: CLI Mode (via Bash — zero MCP overhead)
No MCP config needed. The agent calls codemesh directly via Bash:
export CODEMESH_PROJECT_ROOT=/path/to/your/project
# Primary — one-call question answering:
codemesh explore answer "How does request handling work?"
# Follow-up commands:
codemesh explore search "request flow"
codemesh explore context Source/Core/Session.swift Source/Core/Request.swift
codemesh explore trace Session.request --depth 5
codemesh explore semantic "network request handling" # requires --with-embeddings
All commands return JSON to stdout. No MCP server process, no protocol overhead.
Best for: Sonnet/Haiku, speed-sensitive workflows, simpler setup
Which mode should I use?
| MCP Server | CLI Mode | |
|---|---|---|
| Setup | MCP config file | Just export CODEMESH_PROJECT_ROOT |
| Overhead | MCP protocol per call | Zero — direct subprocess |
| Enrichment | Native codemesh_enrich tool | Via Bash("codemesh enrich ...") |
| Best model | Opus (follows MCP well) | Sonnet (55% cheaper, 61% faster than baseline) |
| Recommended | Complex codebases | Default choice |
4. Use it
The agent now has 6 new tools. Query the graph before reading code:
You: "Find how pydantic handles validation"
Agent calls: codemesh_answer({ question: "How does pydantic handle validation?" })
gets: 9 relevant files ranked by PageRank, call chains,
git hotspots, co-change relationships, 5 suggested reads
Agent calls: Read("pydantic/functional_validators.py", lines 1-50)
reads: only the specific lines suggested by the answer tool
Agent calls: codemesh_enrich({ path: "pydantic/functional_validators.py",
summary: "Primary V2 validator API..." })
saves: summary for next session
Client Integrations
Codemesh speaks the Model Context Protocol, so any MCP-compatible client can use it. Paste one of the snippets below, restart the client, and the six codemesh_* tools show up in the agent's toolbox.
Claude Code (CLI)
Add to ~/.claude/mcp-servers.json (user-wide) or .mcp.json (project-local):
{
"mcpServers": {
"codemesh": {
"command": "npx",
"args": ["-y", "@pyalwin/codemesh"],
"env": {
"CODEMESH_PROJECT_ROOT": "/absolute/path/to/your/project"
}
}
}
}
Claude Desktop (macOS / Windows app)
Edit claude_desktop_config.json:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"codemesh": {
"command": "npx",
"args": ["-y", "@pyalwin/codemesh"],
"env": {
"CODEMESH_PROJECT_ROOT": "/absolute/path/to/your/project"
}
}
}
}
Restart Claude Desktop. Codemesh's tools will appear in the tool picker (hammer icon).
Cursor — stop the agent from wandering your codebase
Cursor reads .cursor/mcp.json per project (or ~/.cursor/mcp.json for all projects):
{
"mcpServers": {
"codemesh": {
"command": "npx",
"args": ["-y", "@pyalwin/codemesh"],
"env": {
"CODEMESH_PROJECT_ROOT": "${workspaceFolder}"
}
}
}
}
Open Settings → MCP, confirm codemesh is green, then mention it in a prompt (@codemesh how does auth work?) to nudge the agent toward graph queries instead of recursive Grep.
Windsurf / VS Code (Continue)
Add to ~/.continue/config.json under experimental.modelContextProtocolServers:
{
"experimental": {
"modelContextProtocolServers": [
{
"transport": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@pyalwin/codemesh"],
"env": {
"CODEMESH_PROJECT_ROOT": "/absolute/path/to/your/project"
}
}
}
]
}
}
Agent Write-Back: the graph that gets smarter
Every other code-intelligence tool indexes your repo once and hands the agent a read-only view. Codemesh lets the agent teach the graph as it works — summaries, workflows, and cross-concept links persist across sessions and survive re-indexing.
// Session 1 — agent reads unfamiliar code, then writes back what it learned.
codemesh_enrich({
path: "pydantic/functional_validators.py",
summary: "Primary V2 validator API. `@field_validator` wraps "
+ "`_decorators.FieldValidatorDecoratorInfo`; `mode='before'|'after'` "
+ "toggles pre/post-coercion execution. Extends BaseValidator.",
concepts: ["validation", "decorators", "v2-api"]
})
// Session 1 — agent traces a multi-file flow, records the path.
codemesh_workflow({
name: "pydantic field validation",
description: "Request → BaseModel.__init__ → SchemaValidator → field_validator",
files: [
"pydantic/main.py",
"pydantic/_internal/_model_construction.py",
"pydantic/functional_validators.py"
]
})
// Session 2 (days later) — same question, different agent instance.
codemesh_answer({ question: "How does pydantic validate fields?" })
// → returns the enriched summary AND the 3-file workflow from Session 1
// before the agent reads a single line. Zero rediscovery cost.
The graph now knows things no static analyzer could infer: why a file matters, which files move together, what a maintainer called a concept. Re-indexing rebuilds the structural layer (files, symbols, imports, calls) but preserves every enrichment — entries only go stale when their referenced files change.
See codemesh_enrich and codemesh_workflow under MCP Tools.
How It Works
┌──────────────────────────────────┐
│ Knowledge Graph │
│ │
│ ┌──────────┐ ┌───────────────┐ │
│ │Structural│ │ Semantic │ │
│ │ (auto) │ │ (agents) │ │
│ │ │ │ │ │
│ │ files │ │ summaries │ │
│ │ symbols │ │ workflows │ │
│ │ imports │ │ concepts │ │
│ │ calls │ │ enrichments │ │
│ └──────────┘ └───────────────┘ │
│ │
│ ┌──────────┐ ┌───────────────┐ │
│ │ Git │ │ Search │ │
│ │ Intel │ │ │ │
│ │ │ │ FTS5 (exact) │ │
│ │ hotspots │ │ Trigram (fuzzy)│ │
│ │ co-change│ │ LanceDB (sem) │ │
│ │ churn │ │ PageRank │ │
│ └──────────┘ └───────────────┘ │
│ │
│ SQLite + LanceDB │
└────────────┬──────────────────────┘
│
┌────────────┴──────────────────────┐
│ MCP Server / CLI (7 tools) │
│ │
│ answer · explore · trace │
│ enrich · workflow · status │
└────────────────────────────────────┘
Structural layer (automatic) — Tree-sitter parses your code into files, symbols (functions, classes, methods), and relationships (imports, calls, extends). Rebuilt on each index.
Semantic layer (agent-built) — As agents work with your code, they write back summaries and workflow paths. These survive re-indexing and accumulate across sessions. Invalidated when referenced files change.
MCP Tools
| Tool | Purpose | Example |
|---|---|---|
codemesh_answer | One-call context assembly — returns all relevant files, call chains, hotspots, suggested reads | codemesh_answer({ question: "How does auth work?" }) |
codemesh_explore | Search, context (multi-target), impact analysis | codemesh_explore({ action: "search", query: "auth" }) |
codemesh_trace | Follow call chains with source code | codemesh_trace({ symbol: "login", depth: 5 }) |
codemesh_enrich | Write back what you learned for future sessions | codemesh_enrich({ path: "src/auth.py", summary: "..." }) |
codemesh_workflow | Record multi-file workflow paths | codemesh_workflow({ name: "login flow", files: [...] }) |
codemesh_status | Graph health check | codemesh_status() |
CLI
codemesh index # structural + git intel + pagerank
codemesh index --with-embeddings # + semantic vectors (~80MB model, zero API cost)
codemesh status # graph statistics
codemesh rebuild # purge and re-index
codemesh explore answer "question" # one-call context assembly (PRIMARY)
codemesh explore search "query" # FTS5 + trigram + semantic search
codemesh explore context file1 file2 # multi-target context
codemesh explore trace symbol --depth 5 # follow call chains
codemesh explore semantic "query" # vector similarity (needs embeddings)
codemesh explore impact file # reverse dependencies
Optional: Hooks & Skills
Skill — teaches agents the graph-first workflow
Copy skills/codemesh.md to ~/.claude/skills/ or your project's .claude/skills/.
# Install the skill so Claude Code loads the workflow automatically
cp /path/to/codemesh/skills/codemesh.md /your/project/.claude/skills/
The skill instructs agents to query the graph before using Grep/Read, and to write back via codemesh_enrich after reading code.
Hooks — automatic pre-read context injection
Add to .claude/settings.json:
{
"hooks": {
"pre_tool_use": [{
"matcher": "Read",
"command": "/path/to/codemesh/hooks/pre-read.sh"
}],
"post_tool_use": [{
"matcher": "Read",
"command": "/path/to/codemesh/hooks/post-read.sh"
}]
}
}
- Pre-read — Injects cached summaries before file reads
- Post-read — Nudges the agent to enrich after reading unfamiliar files
Supported Languages
| TypeScript | JavaScript | Python | Go | Rust | Java | C# |
| Ruby | PHP | C | C++ | Swift | Kotlin | Dart |
Any language with a tree-sitter grammar can be added.
Graph Data Model
Nodes
| Type | Source | Key Fields |
|---|---|---|
file | Static (tree-sitter) | path, hash, last_indexed_at |
symbol | Static (tree-sitter) | name, kind, file_path, line_start, line_end, signature |
concept | Agent-written | summary, last_updated_by, stale |
workflow | Agent-written | description, file_sequence, last_walked_at |
Edges
| Type | Direction | Source |
|---|---|---|
contains | file → symbol | Static |
imports | file → file | Static |
calls | symbol → symbol | Static |
extends | symbol → symbol | Static |
describes | concept → file/symbol | Agent |
related_to | concept → concept | Agent |
traverses | workflow → file | Agent |
Architecture
codemesh/
├── src/
│ ├── index.ts # MCP server entry (stdio transport)
│ ├── server.ts # Tool registration (zod schemas)
│ ├── graph/
│ │ ├── types.ts # Node/edge type definitions
│ │ ├── storage.ts # StorageBackend interface (swappable)
│ │ └── sqlite.ts # SQLite + FTS5 implementation
│ ├── indexer/
│ │ ├── indexer.ts # File walking, hashing, incremental indexing
│ │ ├── parser.ts # Tree-sitter AST extraction
│ │ └── languages.ts # Language registry (ext → grammar)
│ ├── tools/ # 6 MCP tool handlers
│ └── cli.ts # CLI entry point
├── skills/codemesh.md # Agent education skill
├── hooks/ # Pre/post read hooks
└── eval/ # Eval framework (5 tasks, 3 models)
Storage is backend-agnostic. The StorageBackend interface abstracts all persistence. v1 uses SQLite with FTS5 for zero-dependency local operation. The interface supports swapping to Memgraph, Neo4j, or other graph databases.
Eval Framework
Reproducible evaluation harness with LLM-as-judge scoring:
# Setup
npm install -g @pyalwin/codemesh
git clone --depth 1 https://github.com/Alamofire/Alamofire.git /tmp/alamofire
# ... clone other repos ...
# Index
CODEMESH_PROJECT_ROOT=/tmp/alamofire codemesh index
# Run benchmarks
python3 eval/head_to_head.py --model sonnet alamofire excalidraw vscode swift-compiler
See docs/benchmark-results.md for full methodology and results. Early pydantic evals are archived in docs/experiments/.
vs. Existing Tools
| Feature | CodeGraph | Graphify | Axon | Codemesh |
|---|---|---|---|---|
| Structural indexing | Yes | Yes | Yes | Yes |
| FTS search | Yes | — | Yes | Yes |
| Agent write-back | — | — | — | Yes |
| Workflow memory | — | — | — | Yes |
| Hook interception | — | — | — | Yes |
| Backend-swappable | — | — | — | Yes |
| Eval framework | — | — | — | Yes |
| Published benchmarks | — | — | — | Yes |
Development
bun install # Install dependencies
bun run build # Compile TypeScript
bun run test # Run 102 tests
bun run dev # Watch mode
bun run lint # Type check
Contributing
Contributions welcome. Areas for improvement:
- More languages — Add tree-sitter grammars and language-specific extractors
- AST-diff invalidation — Function-level instead of file-level staleness detection
- Graph backends — Memgraph/Neo4j adapters for
StorageBackend - Semantic search — Embedding columns alongside FTS5
- Agent adoption — Better patterns for agents to prefer graph tools naturally
License
MIT
Footnotes
-
Swift Compiler's codemesh index failed to complete (indexer regression on 30k+ file codebases — see known issues). The codemesh numbers above reflect agent behavior with an empty retrieval graph, falling back to Read + LSP — still ahead of baseline, but unrepresentative of codemesh's capability on a properly-indexed Swift repo. ↩ ↩2 ↩3 ↩4
-
Baseline for Alamofire hit a judge error (score recorded as 0 but not meaningful); excluded from the Baseline average. ↩