MCP Hub
Back to servers

mcp-turboquant

MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format. 6 tools: info, check, recommend, quantize, evaluate, push. Self-contained Python server — no external CLI needed.

glama
Updated
Apr 2, 2026

mcp-turboquant

Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.

No external CLI required -- all quantization logic is embedded.

Install

pip install mcp-turboquant

Or run directly with uvx:

uvx mcp-turboquant

Optional backends

The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:

# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]

# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]

# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]

# Everything
pip install mcp-turboquant[all]

Configure

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "mcp-turboquant"
    }
  }
}

Or with uvx (no install needed):

{
  "mcpServers": {
    "turboquant": {
      "command": "uvx",
      "args": ["mcp-turboquant"]
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "uvx",
      "args": ["mcp-turboquant"]
    }
  }
}

Tools

ToolDescriptionHeavy deps?
infoGet model info from HuggingFace (params, size, architecture)No
checkCheck available quantization backends on the systemNo
recommendHardware-aware recommendation for best format + bitsNo
quantizeQuantize a model to GGUF/GPTQ/AWQYes
evaluateRun perplexity evaluation on a quantized modelYes
pushPush quantized model to HuggingFace HubNo

Examples

Once configured, ask Claude:

"Get info on meta-llama/Llama-3.1-8B-Instruct"

"What quantization format should I use for Mistral-7B on my machine?"

"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"

"Check which quantization backends I have installed"

"Evaluate the perplexity of my quantized model at /path/to/model.gguf"

"Push my quantized model to myuser/model-GGUF on HuggingFace"

How it works

Claude / Agent  <-->  MCP Protocol (stdio)  <-->  mcp-turboquant (Python)  <-->  llama-cpp-python / auto-gptq / autoawq

All quantization logic runs in-process. No external CLI tools needed.

Run directly

# As a command
mcp-turboquant

# As a module
python -m mcp_turboquant

License

MIT

Reviews

No reviews yet

Sign in to write a review