GLM-5 MCP Server for Claude Desktop

Reduce Claude Desktop consumption by 10x by delegating heavy tasks to Z.ai's GLM-5 (744B parameter) model through the Model Context Protocol (MCP).

🎯 Problem This Solves

Are you hitting Claude Pro limits too fast?

Weekly limit exhausted in 2 days? ✅
Blocked from all models for days? ✅
Paying $100/month but can't use it 5 days/week? ✅

This MCP server gives you:

10x reduction in Claude consumption (Opus 4.6 → Sonnet 4.5 + GLM-5 delegation)
5x reduction in Claude consumption (Opus 4.6 → Opus 4.6 + GLM-5 delegation)
Continuous availability - never blocked again
Cost-effective scaling - $40-60/month Z.ai vs. $100 to 200$ /month paid additional to claude to continue using
18-30x ROI on your Claude Pro subscription

🚀 Quick Start

1. Get Z.ai API Key

Visit Z.ai and create an account
Navigate to API Keys
Create a new API key
Copy your key (format: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxx)

2. Install

# Clone the repository
git clone https://github.com/Arkya-AI/glm5-mcp.git
cd glm5-mcp

# Install dependencies
npm install

3. Configure Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "glm5": {
      "command": "node",
      "args": [
        "/ABSOLUTE/PATH/TO/glm5-mcp/index.js"
      ],
      "env": {
        "ZAI_API_KEY": "your-z-ai-api-key-here"
      }
    }
  }
}

Important: Replace /ABSOLUTE/PATH/TO/ with the actual path to where you cloned the repo.

4. Restart Claude Desktop

Quit Claude Desktop completely and restart it. The GLM-5 tools will now be available.

🛠️ Available Tools

Core AI Tools

`ask_glm5`

Delegate complex reasoning tasks to GLM-5 (744B parameters).

Use for:

Complex analysis and reasoning
System design and architecture
Advanced problem-solving
Multi-step logical tasks

Parameters:

prompt (required): Your task/question
system_prompt (optional): Custom behavior guidance
temperature (optional): 0.0-1.0, default 0.7
max_tokens (optional): Max response length, default 4000

`ask_glm5_pro`

Same as ask_glm5 but with coding-optimized system prompt.

Use for:

Code generation
Refactoring and optimization
Debugging assistance
Technical implementation

Research & Intelligence Tools

`web_search`

LLM-optimized web search powered by Z.ai.

Use for:

Competitive intelligence
Market research
Real-time news and trends
Finding multiple sources

Parameters:

search_query (required): Your search query
count (optional): Results 1-50, default 10
search_recency_filter (optional): oneDay, oneWeek, oneMonth, oneYear, noLimit
search_domain_filter (optional): Comma-separated domains

`web_reader`

Fetch and parse full web page content.

Use for:

Reading articles and blog posts
Analyzing competitor pages
Extracting documentation
Deep content analysis

Parameters:

url (required): URL to fetch
return_format (optional): markdown or text, default markdown
with_images_summary (optional): Include image summary, default false
with_links_summary (optional): Include links summary, default false
timeout (optional): Timeout in seconds, default 20

`parse_document`

Extract text from PDFs and images using GLM-OCR.

Use for:

PDF proposals and contracts
Scanned documents
Business cards
Invoices and receipts
Complex layouts and tables

Parameters:

file_url (required): Public URL to document/image
return_format (optional): markdown or text, default markdown
parse_mode (optional): auto, ocr, or layout, default auto

Supports: PDF, images up to 50MB or 100 pages

📊 Usage Strategy

Execution Priority (follow this order)

When Claude faces a task, it should follow this priority chain:

Priority	Action	When
1 (FIRST)	Spawn parallel sub-agents	Multi-part tasks with independent pieces. Each sub-agent uses GLM-5 for heavy work
2 (SECOND)	Delegate to GLM-5 directly	Single-unit tasks that can't be parallelized (>50 lines code → `ask_glm5_pro`, >300 words → `ask_glm5`)
3 (LAST RESORT)	Claude does it itself	Only orchestration, file I/O, client polish, responses <100 words

Sub-Agent Enforcement

Task sub-agents spawned by Claude MUST also use GLM-5 for their heavy work. When spawning a sub-agent, explicitly instruct it to use ask_glm5_pro for code generation and ask_glm5 for analysis/docs. Sub-agents that generate >50 lines of code or >300 words of content themselves (without delegating to GLM-5) are violating the delegation model.

Pattern: GLM-5 generates → Claude/sub-agent writes to disk.

Orchestration Model

Claude / Opus (Parent)
    ├── Planning & coordination (stays in Claude)
    ├── File operations & disk I/O (stays in Claude)
    ├── Quick responses <100 words (stays in Claude)
    │
    ├── PRIORITY 1: Spawn parallel sub-agents (for multi-part tasks)
    │    └── Each sub-agent uses GLM-5 for code/analysis
    │         ├── ask_glm5_pro for code generation
    │         ├── ask_glm5 for analysis/docs
    │         └── Sub-agent writes output to disk
    │
    └── PRIORITY 2: Delegate to GLM-5 directly (for single-unit tasks)
         ├── Analysis >300 words
         ├── Code generation >50 lines
         ├── Research synthesis from multiple sources
         └── Document processing & OCR

Model-Specific Strategies

Sonnet 4.5 (default for most tasks):

Orchestration, file ops, disk I/O, quick responses
Delegate all analysis >300 words and code >50 lines to GLM-5
Result: 10x token reduction

Opus 4.6 (complex multi-step coordination):

Priority 1: Spawn parallel sub-agents (each uses GLM-5)
Priority 2: Delegate to GLM-5 directly
Priority 3 (last resort): Opus does it itself
Pattern: Opus = Architect, Sub-agents = Parallel workers, GLM-5 = Content engine
Result: 90%+ reduction in Opus consumption

Example Workflows

Research pipeline:

Start session with Sonnet 4.5
Use web_search to find sources
Use web_reader to fetch content (parallel)
Use ask_glm5 to analyze and synthesize
Sonnet formats and presents results

Multi-part task (Opus):

Opus analyzes task and identifies independent parts
Spawns parallel sub-agents for each part
Each sub-agent calls ask_glm5 or ask_glm5_pro
Each sub-agent writes output to disk
Opus integrates results

Expected Savings

Before GLM-5 MCP:

Weekly limit in 2 days
5 days blocked per week
$100/month for 28% availability

After GLM-5 MCP:

Sonnet 4.5: 10x less quota usage
GLM-5: Handle heavy lifting
100% continuous availability
$140-160/month total cost

ROI: 18-30x improvement in effective cost per hour of usage

🔧 Development

Project Structure

glm5-mcp/
├── index.js           # Main MCP server
├── package.json       # Dependencies
├── README.md          # This file
├── LICENSE            # MIT License
└── .gitignore         # Git ignore rules

Testing

# Test the MCP server
npm start

The server will start in stdio mode and log to stderr. Use Claude Desktop to test the tools.

Adding New Tools

Add tool definition to ListToolsRequestSchema handler
Add tool handler in CallToolRequestSchema handler
Update README documentation
Test in Claude Desktop

📖 Documentation

For Users

README.md - Setup and installation guide (you're here!)
EXAMPLES.md - Real-world usage examples
CONTRIBUTING.md - How to contribute

For Claude

CLAUDE.md - Comprehensive project memory and delegation guidelines
- When to use each tool (Sonnet 4.5 vs Opus 4.6 strategies)
- Code quality standards and patterns
- Optimization rules to reduce Claude consumption
- Development workflow and troubleshooting

API Reference

This MCP server uses the Z.ai API. Key endpoints:

/paas/v4/chat/completions - GLM-5 text generation
/paas/v4/web_search - Web search
/paas/v4/reader - Web content fetching

See Z.ai Documentation for complete API reference.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Guidelines

Follow existing code style
Add tests for new features
Update documentation
Keep tools focused and single-purpose

Ideas for Contributions

Add translation agent support
Add slide generation (GLM Slide Agent)
Add streaming support for real-time responses
Add error retry logic
Add caching for repeated queries
Add usage tracking and analytics

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Anthropic for Claude and MCP
Z.ai for GLM-5 API access
MCP community for the protocol specification

🔗 Links

💡 Use Cases

Competitive Intelligence

1. web_search("competitor X new features 2024")
2. web_reader(top_results)
3. ask_glm5("Analyze competitor strategy and our response")

Research Synthesis

1. web_search("market trends AI agents", count=50)
2. Multiple web_reader() calls
3. ask_glm5("Synthesize findings into executive summary")

Document Analysis

1. parse_document("https://example.com/contract.pdf")
2. ask_glm5("Extract key terms, risks, and obligations")

Code Generation

1. ask_glm5_pro("Build a React component for user authentication with OAuth")
2. Sonnet integrates into codebase

🆘 Troubleshooting

Tools not appearing in Claude Desktop

Check config file path is correct
Verify absolute path to index.js
Restart Claude Desktop completely (quit, don't just close window)
Check Claude Desktop logs for errors

API errors

Verify Z.ai API key is valid
Check API key has sufficient credits
Ensure network connectivity
Check Z.ai service status

Empty responses

GLM-5 may be rate-limited
Try lowering max_tokens
Check error logs in terminal
Verify prompt is clear and specific

📈 Roadmap

Made with ❤️ to help Claude users do more without limits

glm5-mcp

GLM-5 MCP Server for Claude Desktop

🎯 Problem This Solves

🚀 Quick Start

1. Get Z.ai API Key

2. Install

3. Configure Claude Desktop

4. Restart Claude Desktop

🛠️ Available Tools

Core AI Tools

ask_glm5

ask_glm5_pro

Research & Intelligence Tools

web_search

web_reader

parse_document

📊 Usage Strategy

Execution Priority (follow this order)

Sub-Agent Enforcement

Orchestration Model

Model-Specific Strategies

Example Workflows

Expected Savings

🔧 Development

Project Structure

Testing

Adding New Tools

📖 Documentation

For Users

For Claude

API Reference

🤝 Contributing

Development Guidelines

Ideas for Contributions

📄 License

🙏 Acknowledgments

🔗 Links

💡 Use Cases

Competitive Intelligence

Research Synthesis

Document Analysis

Code Generation

🆘 Troubleshooting

Tools not appearing in Claude Desktop

API errors

Empty responses

📈 Roadmap

Reviews

`ask_glm5`

`ask_glm5_pro`

`web_search`

`web_reader`

`parse_document`