MCP Hub
Back to servers

ppb-mcp

Exposes queryable GPU inference benchmark data (quantization, throughput, VRAM, concurrent users) as tools for LLM clients.

glama
Updated
Apr 25, 2026

ppb-mcp

An MCP server that exposes Poor Paul's Benchmark GPU inference data — quantization × throughput × VRAM × concurrent users — as queryable tools to any LLM client.

CI PyPI License: MIT

Hosted instance: https://mcp.poorpaul.dev/ (streamable-http transport, no auth)

What it does

Connect any MCP-aware client (Claude Desktop, Cline, Continue, etc.) to ask questions like:

  • "What's the best quantization for a 32 GB GPU running Qwen3.5-9B with 8 concurrent users?"
  • "Show me every model tested at Q4_K_M on the RTX 5090."
  • "Will Llama-13B at Q5_K_M fit on a 24 GB GPU at 4 concurrent users?"

It exposes four tools backed by 30,000+ real benchmark rows:

ToolWhat it does
list_tested_configsLists every tested GPU, model, and quantization (call this first)
query_ppb_resultsFilters raw benchmark rows by GPU / VRAM / model / quant / users / backend
recommend_quantizationThree-tier empirical-first recommendation engine (high / medium / low confidence)
get_gpu_headroomSanity-checks a (gpu, model, quant, users) configuration for VRAM headroom

Install

1) Use the hosted instance (zero setup)

Add to your MCP client config (Claude Desktop example, ~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "ppb": {
      "transport": { "type": "http", "url": "https://mcp.poorpaul.dev/mcp" }
    }
  }
}

2) pip install and run locally (stdio)

pip install ppb-mcp
MCP_TRANSPORT=stdio ppb-mcp

Claude Desktop config:

{
  "mcpServers": {
    "ppb": {
      "command": "ppb-mcp",
      "env": { "MCP_TRANSPORT": "stdio" }
    }
  }
}

3) Docker

docker run --rm -p 9933:9933 \
  -e MCP_TRANSPORT=streamable-http \
  -v ppb-hf-cache:/data/huggingface \
  ghcr.io/paulplee/ppb-mcp:latest

4) From source

git clone https://github.com/paulplee/ppb-mcp
cd ppb-mcp
pip install -e ".[dev]"
ppb-mcp           # streamable-http on :9933

Connect Your LLM Client

All clients use the same hosted endpoint: https://mcp.poorpaul.dev/mcp

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "ppb": {
      "transport": { "type": "http", "url": "https://mcp.poorpaul.dev/mcp" }
    }
  }
}

Restart Claude Desktop after saving.

Cursor

Edit ~/.cursor/mcp.json (create if it doesn't exist):

{
  "mcpServers": {
    "ppb": {
      "url": "https://mcp.poorpaul.dev/mcp",
      "type": "http"
    }
  }
}

Or via UI: Settings → Tools & Integrations → MCP → Add Server.

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "ppb": {
      "serverUrl": "https://mcp.poorpaul.dev/mcp",
      "transport": "http"
    }
  }
}

VS Code (GitHub Copilot Agent Mode)

Add to your .vscode/mcp.json (workspace) or User settings.json:

{
  "mcp": {
    "servers": {
      "ppb": {
        "type": "http",
        "url": "https://mcp.poorpaul.dev/mcp"
      }
    }
  }
}

Zed

Add to ~/.config/zed/settings.json under "context_servers":

{
  "context_servers": {
    "ppb": {
      "command": {
        "path": "env",
        "args": ["MCP_TRANSPORT=stdio", "uvx", "ppb-mcp"]
      }
    }
  }
}

Cline (VS Code extension)

Open the Cline panel → MCP Servers tab → Add Server → select SSE/HTTP → paste https://mcp.poorpaul.dev/mcp.

Continue.dev

Add to ~/.continue/config.yaml:

mcpServers:
  - name: ppb
    transport:
      type: http
      url: https://mcp.poorpaul.dev/mcp

OpenCode

Add to ~/.config/opencode/config.json:

{
  "mcp": {
    "ppb": {
      "type": "remote",
      "url": "https://mcp.poorpaul.dev/mcp"
    }
  }
}

Goose (Block)

goose mcp add ppb --transport http --url https://mcp.poorpaul.dev/mcp

Any stdio-compatible client

# Zero-install (requires uv):
env MCP_TRANSPORT=stdio uvx ppb-mcp

# After pip install:
env MCP_TRANSPORT=stdio ppb-mcp

Note on transport key names: MCP clients are not yet fully standardised on JSON key names for the HTTP transport. If your client doesn't connect with "type": "http", try "transport": "http", "type": "sse", or "transport": "streamable-http". The endpoint URL is the same regardless.

Example session

> list_tested_configs
{ "gpus": ["Apple M4 Pro", "NVIDIA GB10", "NVIDIA GeForce RTX 5090"],
  "models": ["Qwen3.5-9B", ...], "quantizations": ["Q4_K_M", ...] }

> recommend_quantization(gpu_vram_gb=32, concurrent_users=8, model="Qwen3.5-9B", priority="balance")
{ "recommended_quantization": "Q5_K_M",
  "estimated_vram_usage_gb": 27.8,
  "estimated_tokens_per_second": 142.0,
  "headroom_gb": 4.2,
  "confidence": "high",
  "reasoning": "Q5_K_M is recommended for your NVIDIA GeForce RTX 5090 (32 GB) ...",
  "alternatives": ["Q4_K_M", "Q8_0"] }

Configuration

Env varDefaultNotes
HF_DATASETpaulplee/ppb-resultsHuggingFace dataset ID
REFRESH_INTERVAL_HOURS1Background refresh cadence
MCP_TRANSPORTstreamable-httpstdio or streamable-http
HOST0.0.0.0HTTP bind host
PORT9933HTTP bind port
LOG_LEVELINFOPython logging level

Self-hosting (Lightsail / any Ubuntu VPS)

git clone https://github.com/paulplee/ppb-mcp /tmp/ppb-mcp
cd /tmp/ppb-mcp
DOMAIN=mcp.example.com EMAIL=you@example.com ./deploy/deploy.sh

This installs Docker, builds the image, registers a systemd unit, configures nginx, and runs certbot.

Development

pip install -e ".[dev]"
ruff check src tests
pytest -v

Integration tests against the live HuggingFace dataset are gated behind PPB_RUN_INTEGRATION=1 to keep CI offline-clean.

How recommendations work

  1. Tier 1 — empirical exact match (high confidence). ≥3 measured runs on a GPU at-or-below your VRAM budget at the requested concurrency.
  2. Tier 2 — empirical-near (medium). Same (model, quant) benchmarked on a different GPU at the same concurrency; throughput borrowed, VRAM scaled to your card.
  3. Tier 3 — formula extrapolation (low). vram_per_user ≈ (params_B × bits_per_weight / 8) × 1.15; viable iff total ≤ 90 % of your VRAM.

License

MIT — see LICENSE.

Reviews

No reviews yet

Sign in to write a review