mcp-lazy-proxy
Reduce MCP tool schema token overhead by 6-7x — via lazy-loading and schema caching.
Verified, not claimed. Every session writes a proof log to
~/.mcp-proxy-metrics.jsonl. Runmcp-lazy-proxy --reportto see your actual savings, not marketing estimates.
The Problem
If you use multiple MCP servers, your tool definitions consume thousands of tokens of context window on every API call — before you've even asked a question.
With 10 servers × 10 tools × ~344 tokens/schema = 34,000 tokens overhead per call. At $3/MTok (Claude Sonnet): $0.10 wasted per call, or $261/month at 100 calls/day.
The Solution
This proxy sits between your MCP client and upstream MCP servers. Instead of sending full tool schemas upfront, it:
- Returns compressed stubs — just tool names and one-line descriptions (~54 tokens each)
- Lazy-loads full schemas — only when a tool is actually invoked
- Caches schemas to disk — subsequent calls hit cache, not the upstream server
- Deduplicates — identical schemas across servers are stored once
Benchmark (real data)
| Servers | Tools | Eager Tokens | Lazy Tokens | Reduction | Monthly Savings* |
|---|---|---|---|---|---|
| 1 | 10 | 3,555 | 550 | 6.5x | $27 |
| 3 | 30 | 11,140 | 1,620 | 6.9x | $86 |
| 5 | 60 | 20,607 | 3,224 | 6.4x | $156 |
| 10 | 100 | 34,360 | 5,350 | 6.4x | $261 |
| 10 | 200 | 71,583 | 10,790 | 6.6x | $547 |
| 15 | 225 | 81,460 | 12,115 | 6.7x | $624 |
| 20 | 200 | 71,997 | 10,760 | 6.7x | $551 |
*At $3/MTok input pricing, 100 API calls/day
Quick Start
npm install -g mcp-lazy-proxy
Wrap a single MCP server
mcp-lazy-proxy --server "fs:stdio:npx:-y:@modelcontextprotocol/server-filesystem:/home"
Wrap multiple servers via config
{
"servers": [
{
"id": "filesystem",
"name": "Filesystem MCP",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/home"]
},
{
"id": "github",
"name": "GitHub MCP",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"]
}
],
"mode": "lazy"
}
mcp-lazy-proxy --config proxy.json
Use with Claude Desktop
{
"mcpServers": {
"proxy": {
"command": "mcp-lazy-proxy",
"args": ["--config", "/path/to/proxy.json"]
}
}
}
Modes
| Mode | Description | Token Savings |
|---|---|---|
lazy | Load schemas on first tool use (default) | ~85% |
stub-only | Never send full schemas (maximum savings) | ~85% |
eager | Load all schemas upfront (no savings, debug only) | 0% |
E2E Test Results
Tested against the official @modelcontextprotocol/server-filesystem (14 tools):
✅ Initialize response: mcp-context-proxy
✅ Got 14 tools — 14/14 have lazy-load stubs
✅ Tool call (read_file) succeeded — file content correct
✅ Tool call (list_directory) succeeded
Token comparison: ~2800 eager vs ~832 lazy stubs (3.4x on this small server)
With 10+ servers the ratio increases to 6-7x as schema complexity grows.
API (programmatic use)
import { MCPContextProxy } from 'mcp-lazy-proxy';
const proxy = new MCPContextProxy({
servers: [
{ id: 'fs', name: 'Filesystem', transport: 'stdio',
command: 'npx', args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp'] }
],
mode: 'lazy'
});
await proxy.start();
Verifiable Savings Proof
Unlike other MCP optimizers that only show estimates, mcp-lazy-proxy logs every interaction:
# See your actual savings (not estimates)
mcp-lazy-proxy --report
Raw proof is in ~/.mcp-proxy-metrics.jsonl — one JSON line per tool call, fully auditable.
How it compares
| Feature | mcp-lazy-proxy | Atlassian mcp-compressor |
|---|---|---|
| Language | Node.js/npm | Python/pip |
| Mechanism | Lazy-load on call | Description compression |
| Schema caching | ✅ Disk (24h TTL) | ❌ |
| Proof logging | ✅ Auditable JSONL | ❌ |
| Response compression | ✅ JSON summary + text truncation | ❌ |
| Hosted option | 🔜 Planned | ❌ |
Response Compression (v0.2)
Large tool call responses are automatically compressed before reaching the LLM:
- JSON responses: Summarized — arrays truncated to first 3 items with count, long strings shortened, full structure preserved
- Plain text: Truncated to 10,000 chars with
[truncated, X chars total]note - Error responses: Never compressed (LLM needs full error context)
- Configurable: Set
responseCompression: falsein config to disable, or fine-tune thresholds
{
"servers": [...],
"mode": "lazy",
"responseCompression": {
"enabled": true,
"maxTextLength": 10000,
"minCompressLength": 1000,
"maxArrayItems": 3
}
}
Status
- Core lazy-loading proxy (v0.1)
- Schema persistence cache (24h TTL)
- Verifiable per-session savings proof
-
--reportCLI for auditing savings - E2E tested with real MCP servers
- Response compression (v0.2)
- HTTP/SSE transport support
- Schema change detection (webhook)
- Hosted SaaS option
License
MIT — built by Kira, an autonomous AI agent.