MCP Gateway
Cut 83-89% of your Claude Code context window overhead from MCP tool schemas.
Every MCP server you register dumps its full JSON tool schema into your context window — every conversation, whether you use those tools or not. If you have 5 servers with 30+ tools each, that's thousands of tokens burned before you type a single character.
MCP Gateway replaces N tool schemas with 3-4 dispatch tools that proxy requests to your existing MCP servers. The underlying servers stay exactly the same. You just stop paying the token tax.
The Problem
Before (real numbers from a multi-account setup):
Google Workspace × 3 accounts = 142 tools × 3 = 426 tool schemas
Telegram = 92 tool schemas
Linear × 3 accounts = 42 tools × 3 = 126 tool schemas
─────────────────────────────────────────────────
Total = 644 tool schemas = ~57,000 tokens
After:
Google Workspace gateway = 3 tool schemas
Services gateway (TG + Linear) = 4 tool schemas
─────────────────────────────────────────────────
Total = 7 tool schemas = ~6,200 tokens
Savings: 89% fewer tokens, every single conversation.
How It Works
Instead of registering each MCP server directly in your Claude Code config, you register a single gateway server that proxies tool calls to the underlying servers on demand.
Claude sees 3-4 generic tools (gw, gw_discover, gw_batch or tg, linear, etc.) instead of hundreds of specialized ones. When Claude needs to call a specific tool, it uses the dispatch tool with the tool name as a parameter. The gateway forwards the call to the right backend.
Two patterns are included, depending on what your upstream MCP server supports:
Pattern 1: CLI Dispatch (cli_gateway.py)
For MCP servers that support a --cli mode (subprocess per call). Each tool invocation spawns a short-lived process. Good for servers like google-workspace-mcp that have built-in CLI modes.
Features:
- Multi-account routing (one gateway, N credential sets)
- Auto-injection of per-account parameters (e.g.,
user_google_email) - Tool-to-service mapping for faster cold starts (only loads the needed module)
- Discovery with caching
- Batch execution (parallel tool calls in one request)
Pattern 2: Persistent MCP Client (persistent_gateway.py)
For MCP servers without CLI mode. Maintains persistent subprocess connections to upstream MCP servers, avoiding cold-start latency on every call. Uses the MCP SDK's ClientSession with AsyncExitStack for lifecycle management.
Features:
- Lazy connection (connects on first use, not at startup)
- Auto-reconnect on connection failure
- Multi-service routing (multiple MCP servers behind one gateway)
- Multi-account support per service
- Tool discovery with caching
Quick Start
1. Install
# Clone
git clone https://github.com/block-town/mcp-gateway.git
cd mcp-gateway
# Install dependencies
pip install fastmcp pyyaml python-dotenv
# or
uv pip install fastmcp pyyaml python-dotenv
2. Configure
Copy the example configs and fill in your details:
cp config.example.yaml config.yaml
# Edit config.yaml with your accounts, paths, and credentials
3. Choose Your Pattern
CLI Dispatch (for servers with --cli mode):
cp cli_gateway.py server.py
# Edit server.py — update the tool descriptions with your account names and common tools
Persistent Client (for servers without CLI mode):
cp persistent_gateway.py server.py
# Edit server.py — update service names and tool descriptions
4. Register in Claude Code
Add to your ~/.claude/settings.json (global) or project .mcp.json:
{
"mcpServers": {
"gateway": {
"command": "python3",
"args": ["/path/to/mcp-gateway/server.py"]
}
}
}
Then remove the original MCP server entries that the gateway now proxies.
5. Use
# Before (142 tools polluting your context):
gw_discover("gmail") → see available Gmail tools + params
gw("work", "search_gmail_messages", {"query": "is:unread"})
gw_batch("work", [
{"tool": "search_gmail_messages", "params": {"query": "is:unread"}},
{"tool": "get_events", "params": {"time_min": "2025-01-01"}}
])
# Persistent gateway:
tg("send_message", {"chat_id": "123", "text": "hello"})
linear("work", "linear_getIssues", {"teamId": "TEAM-1"})
Configuration
CLI Gateway (config.example.yaml — accounts mode)
# Path to the upstream MCP server
upstream_dir: "/path/to/google-workspace-mcp"
# Command to run the upstream server in CLI mode
runner: "uv"
accounts:
personal:
client_id: "your-oauth-client-id"
client_secret: "your-oauth-client-secret"
credentials_dir: "/path/to/credentials/personal"
email: "you@gmail.com"
work:
client_id: "your-work-oauth-client-id"
client_secret: "your-work-oauth-client-secret"
credentials_dir: "/path/to/credentials/work"
email: "you@company.com"
Persistent Gateway (config.example.yaml — services mode)
services:
telegram:
command: "uv"
args: ["--directory", "/path/to/telegram-mcp", "run", "main.py"]
env_file: "/path/to/telegram-mcp/.env"
linear:
command: "npx"
args: ["-y", "@tacticlaunch/mcp-linear"]
accounts:
work:
LINEAR_API_TOKEN: "lin_api_xxxxx"
personal:
LINEAR_API_TOKEN: "lin_api_yyyyy"
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Claude Code │
│ │
│ Context window sees: 3-4 tool schemas (~6K tokens) │
│ Instead of: 644 tool schemas (~57K tokens) │
│ │
│ gw("work", "search_gmail_messages", {"query": "..."}) │
│ tg("send_message", {"chat_id": "...", "text": "..."}) │
└──────────────────┬──────────────────────────────────────────┘
│
┌─────────▼─────────┐
│ MCP Gateway │
│ (FastMCP) │
│ │
│ 3-4 tools that │
│ dispatch to │
│ upstream MCP │
│ servers │
└───┬─────┬─────┬──┘
│ │ │
┌────────▼┐ ┌──▼───┐ ┌▼────────┐
│ Google │ │ Tele-│ │ Linear │
│ MCP │ │ gram │ │ MCP │
│ (CLI) │ │ MCP │ │ (×N │
│ │ │ │ │ accts) │
└─────────┘ └──────┘ └─────────┘
When to Use This
Good fit:
- You have 3+ MCP servers registered and context window pressure is real
- Multi-account setups (same server, different credentials)
- Servers with 30+ tools where you use maybe 5-10 regularly
Not worth it:
- Single MCP server with <10 tools
- You rarely hit context limits
- The MCP server is already tiny
Token Math
Each MCP tool schema is roughly 200-800 tokens of JSON (tool name, description, parameter schema with types/descriptions/required fields). To measure your own overhead:
- Count your registered tools: look at your
settings.jsonand.mcp.jsonfiles - Estimate ~400 tokens per tool (conservative average)
- Multiply by every conversation you start
A gateway tool schema is ~800-900 tokens (larger description with embedded cheat-sheet), but you only have 3-4 of them instead of hundreds.
Adapting to Your Stack
The two gateway files are templates. Fork and modify:
- Change the tool names —
gw/tg/linearare just conventions. Name them whatever makes sense. - Update the docstrings — The tool descriptions are the cheat-sheet Claude sees. List your most common tools and their params there.
- Add more services — The persistent gateway pattern works with any MCP server. Add a new service block in config, build a new client, expose new dispatch tools.
- Strip what you don't need — If you only have one account, remove multi-account routing. If you don't need batch, remove
gw_batch.
License
MIT