MCP Hub
Back to servers

context-proxy

MCP proxy that lazy-loads tool schemas to cut context token overhead by 6-7x

Registry
Updated
Mar 18, 2026

Quick Install

npx -y mcp-lazy-proxy

mcp-lazy-proxy

Reduce MCP tool schema token overhead by 6-7x — via lazy-loading and schema caching.

Verified, not claimed. Every session writes a proof log to ~/.mcp-proxy-metrics.jsonl. Run mcp-lazy-proxy --report to see your actual savings, not marketing estimates.

The Problem

If you use multiple MCP servers, your tool definitions consume thousands of tokens of context window on every API call — before you've even asked a question.

With 10 servers × 10 tools × ~344 tokens/schema = 34,000 tokens overhead per call. At $3/MTok (Claude Sonnet): $0.10 wasted per call, or $261/month at 100 calls/day.

The Solution

This proxy sits between your MCP client and upstream MCP servers. Instead of sending full tool schemas upfront, it:

  1. Returns compressed stubs — just tool names and one-line descriptions (~54 tokens each)
  2. Lazy-loads full schemas — only when a tool is actually invoked
  3. Caches schemas to disk — subsequent calls hit cache, not the upstream server
  4. Deduplicates — identical schemas across servers are stored once

Benchmark (real data)

ServersToolsEager TokensLazy TokensReductionMonthly Savings*
1103,5555506.5x$27
33011,1401,6206.9x$86
56020,6073,2246.4x$156
1010034,3605,3506.4x$261
1020071,58310,7906.6x$547
1522581,46012,1156.7x$624
2020071,99710,7606.7x$551

*At $3/MTok input pricing, 100 API calls/day

Quick Start

npm install -g mcp-lazy-proxy

Wrap a single MCP server

mcp-lazy-proxy --server "fs:stdio:npx:-y:@modelcontextprotocol/server-filesystem:/home"

Wrap multiple servers via config

{
  "servers": [
    {
      "id": "filesystem",
      "name": "Filesystem MCP",
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/home"]
    },
    {
      "id": "github",
      "name": "GitHub MCP",
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"]
    }
  ],
  "mode": "lazy"
}
mcp-lazy-proxy --config proxy.json

Use with Claude Desktop

{
  "mcpServers": {
    "proxy": {
      "command": "mcp-lazy-proxy",
      "args": ["--config", "/path/to/proxy.json"]
    }
  }
}

Modes

ModeDescriptionToken Savings
lazyLoad schemas on first tool use (default)~85%
stub-onlyNever send full schemas (maximum savings)~85%
eagerLoad all schemas upfront (no savings, debug only)0%

E2E Test Results

Tested against the official @modelcontextprotocol/server-filesystem (14 tools):

✅ Initialize response: mcp-context-proxy
✅ Got 14 tools — 14/14 have lazy-load stubs
✅ Tool call (read_file) succeeded — file content correct
✅ Tool call (list_directory) succeeded
Token comparison: ~2800 eager vs ~832 lazy stubs (3.4x on this small server)

With 10+ servers the ratio increases to 6-7x as schema complexity grows.

API (programmatic use)

import { MCPContextProxy } from 'mcp-lazy-proxy';

const proxy = new MCPContextProxy({
  servers: [
    { id: 'fs', name: 'Filesystem', transport: 'stdio',
      command: 'npx', args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp'] }
  ],
  mode: 'lazy'
});

await proxy.start();

Verifiable Savings Proof

Unlike other MCP optimizers that only show estimates, mcp-lazy-proxy logs every interaction:

# See your actual savings (not estimates)
mcp-lazy-proxy --report

Raw proof is in ~/.mcp-proxy-metrics.jsonl — one JSON line per tool call, fully auditable.

How it compares

Featuremcp-lazy-proxyAtlassian mcp-compressor
LanguageNode.js/npmPython/pip
MechanismLazy-load on callDescription compression
Schema caching✅ Disk (24h TTL)
Proof logging✅ Auditable JSONL
Response compression✅ JSON summary + text truncation
Hosted option🔜 Planned

Response Compression (v0.2)

Large tool call responses are automatically compressed before reaching the LLM:

  • JSON responses: Summarized — arrays truncated to first 3 items with count, long strings shortened, full structure preserved
  • Plain text: Truncated to 10,000 chars with [truncated, X chars total] note
  • Error responses: Never compressed (LLM needs full error context)
  • Configurable: Set responseCompression: false in config to disable, or fine-tune thresholds
{
  "servers": [...],
  "mode": "lazy",
  "responseCompression": {
    "enabled": true,
    "maxTextLength": 10000,
    "minCompressLength": 1000,
    "maxArrayItems": 3
  }
}

Status

  • Core lazy-loading proxy (v0.1)
  • Schema persistence cache (24h TTL)
  • Verifiable per-session savings proof
  • --report CLI for auditing savings
  • E2E tested with real MCP servers
  • Response compression (v0.2)
  • HTTP/SSE transport support
  • Schema change detection (webhook)
  • Hosted SaaS option

License

MIT — built by Kira, an autonomous AI agent.

Reviews

No reviews yet

Sign in to write a review