MCP Hub
Back to servers

MCP Gateway

Reduces LLM context window overhead by proxying multiple MCP servers through a few efficient dispatch tools instead of registering hundreds of individual tool schemas. It supports multi-account routing and tool discovery for both CLI-based and persistent MCP server configurations.

glama
Updated
Mar 8, 2026

MCP Gateway

Cut 83-89% of your Claude Code context window overhead from MCP tool schemas.

Every MCP server you register dumps its full JSON tool schema into your context window — every conversation, whether you use those tools or not. If you have 5 servers with 30+ tools each, that's thousands of tokens burned before you type a single character.

MCP Gateway replaces N tool schemas with 3-4 dispatch tools that proxy requests to your existing MCP servers. The underlying servers stay exactly the same. You just stop paying the token tax.

The Problem

Before (real numbers from a multi-account setup):
  Google Workspace × 3 accounts  = 142 tools × 3 = 426 tool schemas
  Telegram                       = 92 tool schemas
  Linear × 3 accounts            = 42 tools × 3  = 126 tool schemas
  ─────────────────────────────────────────────────
  Total                          = 644 tool schemas = ~57,000 tokens

After:
  Google Workspace gateway       = 3 tool schemas
  Services gateway (TG + Linear) = 4 tool schemas
  ─────────────────────────────────────────────────
  Total                          = 7 tool schemas   = ~6,200 tokens

Savings: 89% fewer tokens, every single conversation.

How It Works

Instead of registering each MCP server directly in your Claude Code config, you register a single gateway server that proxies tool calls to the underlying servers on demand.

Claude sees 3-4 generic tools (gw, gw_discover, gw_batch or tg, linear, etc.) instead of hundreds of specialized ones. When Claude needs to call a specific tool, it uses the dispatch tool with the tool name as a parameter. The gateway forwards the call to the right backend.

Two patterns are included, depending on what your upstream MCP server supports:

Pattern 1: CLI Dispatch (cli_gateway.py)

For MCP servers that support a --cli mode (subprocess per call). Each tool invocation spawns a short-lived process. Good for servers like google-workspace-mcp that have built-in CLI modes.

Features:

  • Multi-account routing (one gateway, N credential sets)
  • Auto-injection of per-account parameters (e.g., user_google_email)
  • Tool-to-service mapping for faster cold starts (only loads the needed module)
  • Discovery with caching
  • Batch execution (parallel tool calls in one request)

Pattern 2: Persistent MCP Client (persistent_gateway.py)

For MCP servers without CLI mode. Maintains persistent subprocess connections to upstream MCP servers, avoiding cold-start latency on every call. Uses the MCP SDK's ClientSession with AsyncExitStack for lifecycle management.

Features:

  • Lazy connection (connects on first use, not at startup)
  • Auto-reconnect on connection failure
  • Multi-service routing (multiple MCP servers behind one gateway)
  • Multi-account support per service
  • Tool discovery with caching

Quick Start

1. Install

# Clone
git clone https://github.com/block-town/mcp-gateway.git
cd mcp-gateway

# Install dependencies
pip install fastmcp pyyaml python-dotenv
# or
uv pip install fastmcp pyyaml python-dotenv

2. Configure

Copy the example configs and fill in your details:

cp config.example.yaml config.yaml
# Edit config.yaml with your accounts, paths, and credentials

3. Choose Your Pattern

CLI Dispatch (for servers with --cli mode):

cp cli_gateway.py server.py
# Edit server.py — update the tool descriptions with your account names and common tools

Persistent Client (for servers without CLI mode):

cp persistent_gateway.py server.py
# Edit server.py — update service names and tool descriptions

4. Register in Claude Code

Add to your ~/.claude/settings.json (global) or project .mcp.json:

{
  "mcpServers": {
    "gateway": {
      "command": "python3",
      "args": ["/path/to/mcp-gateway/server.py"]
    }
  }
}

Then remove the original MCP server entries that the gateway now proxies.

5. Use

# Before (142 tools polluting your context):
gw_discover("gmail")     → see available Gmail tools + params
gw("work", "search_gmail_messages", {"query": "is:unread"})
gw_batch("work", [
  {"tool": "search_gmail_messages", "params": {"query": "is:unread"}},
  {"tool": "get_events", "params": {"time_min": "2025-01-01"}}
])

# Persistent gateway:
tg("send_message", {"chat_id": "123", "text": "hello"})
linear("work", "linear_getIssues", {"teamId": "TEAM-1"})

Configuration

CLI Gateway (config.example.yaml — accounts mode)

# Path to the upstream MCP server
upstream_dir: "/path/to/google-workspace-mcp"

# Command to run the upstream server in CLI mode
runner: "uv"

accounts:
  personal:
    client_id: "your-oauth-client-id"
    client_secret: "your-oauth-client-secret"
    credentials_dir: "/path/to/credentials/personal"
    email: "you@gmail.com"
  work:
    client_id: "your-work-oauth-client-id"
    client_secret: "your-work-oauth-client-secret"
    credentials_dir: "/path/to/credentials/work"
    email: "you@company.com"

Persistent Gateway (config.example.yaml — services mode)

services:
  telegram:
    command: "uv"
    args: ["--directory", "/path/to/telegram-mcp", "run", "main.py"]
    env_file: "/path/to/telegram-mcp/.env"

  linear:
    command: "npx"
    args: ["-y", "@tacticlaunch/mcp-linear"]
    accounts:
      work:
        LINEAR_API_TOKEN: "lin_api_xxxxx"
      personal:
        LINEAR_API_TOKEN: "lin_api_yyyyy"

Architecture

┌─────────────────────────────────────────────────────────────┐
│  Claude Code                                                │
│                                                             │
│  Context window sees: 3-4 tool schemas (~6K tokens)         │
│  Instead of:          644 tool schemas (~57K tokens)         │
│                                                             │
│  gw("work", "search_gmail_messages", {"query": "..."})      │
│  tg("send_message", {"chat_id": "...", "text": "..."})      │
└──────────────────┬──────────────────────────────────────────┘
                   │
         ┌─────────▼─────────┐
         │   MCP Gateway     │
         │   (FastMCP)       │
         │                   │
         │   3-4 tools that  │
         │   dispatch to     │
         │   upstream MCP    │
         │   servers         │
         └───┬─────┬─────┬──┘
             │     │     │
    ┌────────▼┐ ┌──▼───┐ ┌▼────────┐
    │ Google  │ │ Tele-│ │ Linear  │
    │ MCP     │ │ gram │ │ MCP     │
    │ (CLI)   │ │ MCP  │ │ (×N     │
    │         │ │      │ │ accts)  │
    └─────────┘ └──────┘ └─────────┘

When to Use This

Good fit:

  • You have 3+ MCP servers registered and context window pressure is real
  • Multi-account setups (same server, different credentials)
  • Servers with 30+ tools where you use maybe 5-10 regularly

Not worth it:

  • Single MCP server with <10 tools
  • You rarely hit context limits
  • The MCP server is already tiny

Token Math

Each MCP tool schema is roughly 200-800 tokens of JSON (tool name, description, parameter schema with types/descriptions/required fields). To measure your own overhead:

  1. Count your registered tools: look at your settings.json and .mcp.json files
  2. Estimate ~400 tokens per tool (conservative average)
  3. Multiply by every conversation you start

A gateway tool schema is ~800-900 tokens (larger description with embedded cheat-sheet), but you only have 3-4 of them instead of hundreds.

Adapting to Your Stack

The two gateway files are templates. Fork and modify:

  1. Change the tool namesgw/tg/linear are just conventions. Name them whatever makes sense.
  2. Update the docstrings — The tool descriptions are the cheat-sheet Claude sees. List your most common tools and their params there.
  3. Add more services — The persistent gateway pattern works with any MCP server. Add a new service block in config, build a new client, expose new dispatch tools.
  4. Strip what you don't need — If you only have one account, remove multi-account routing. If you don't need batch, remove gw_batch.

License

MIT

Reviews

No reviews yet

Sign in to write a review