GLM-5 MCP Server for Claude Desktop
Reduce Claude Desktop consumption by 10x by delegating heavy tasks to Z.ai's GLM-5 (744B parameter) model through the Model Context Protocol (MCP).
🎯 Problem This Solves
Are you hitting Claude Pro limits too fast?
- Weekly limit exhausted in 2 days? ✅
- Blocked from all models for days? ✅
- Paying $100/month but can't use it 5 days/week? ✅
This MCP server gives you:
- 10x reduction in Claude consumption (Opus 4.6 → Sonnet 4.5 + GLM-5 delegation)
- 5x reduction in Claude consumption (Opus 4.6 → Opus 4.6 + GLM-5 delegation)
- Continuous availability - never blocked again
- Cost-effective scaling - $40-60/month Z.ai vs. $100 to 200$ /month paid additional to claude to continue using
- 18-30x ROI on your Claude Pro subscription
🚀 Quick Start
1. Get Z.ai API Key
- Visit Z.ai and create an account
- Navigate to API Keys
- Create a new API key
- Copy your key (format:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxx)
2. Install
# Clone the repository
git clone https://github.com/Arkya-AI/glm5-mcp.git
cd glm5-mcp
# Install dependencies
npm install
3. Configure Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"glm5": {
"command": "node",
"args": [
"/ABSOLUTE/PATH/TO/glm5-mcp/index.js"
],
"env": {
"ZAI_API_KEY": "your-z-ai-api-key-here"
}
}
}
}
Important: Replace /ABSOLUTE/PATH/TO/ with the actual path to where you cloned the repo.
4. Restart Claude Desktop
Quit Claude Desktop completely and restart it. The GLM-5 tools will now be available.
🛠️ Available Tools
Core AI Tools
ask_glm5
Delegate complex reasoning tasks to GLM-5 (744B parameters).
Use for:
- Complex analysis and reasoning
- System design and architecture
- Advanced problem-solving
- Multi-step logical tasks
Parameters:
prompt(required): Your task/questionsystem_prompt(optional): Custom behavior guidancetemperature(optional): 0.0-1.0, default 0.7max_tokens(optional): Max response length, default 4000
ask_glm5_pro
Same as ask_glm5 but with coding-optimized system prompt.
Use for:
- Code generation
- Refactoring and optimization
- Debugging assistance
- Technical implementation
Research & Intelligence Tools
web_search
LLM-optimized web search powered by Z.ai.
Use for:
- Competitive intelligence
- Market research
- Real-time news and trends
- Finding multiple sources
Parameters:
search_query(required): Your search querycount(optional): Results 1-50, default 10search_recency_filter(optional): oneDay, oneWeek, oneMonth, oneYear, noLimitsearch_domain_filter(optional): Comma-separated domains
web_reader
Fetch and parse full web page content.
Use for:
- Reading articles and blog posts
- Analyzing competitor pages
- Extracting documentation
- Deep content analysis
Parameters:
url(required): URL to fetchreturn_format(optional): markdown or text, default markdownwith_images_summary(optional): Include image summary, default falsewith_links_summary(optional): Include links summary, default falsetimeout(optional): Timeout in seconds, default 20
parse_document
Extract text from PDFs and images using GLM-OCR.
Use for:
- PDF proposals and contracts
- Scanned documents
- Business cards
- Invoices and receipts
- Complex layouts and tables
Parameters:
file_url(required): Public URL to document/imagereturn_format(optional): markdown or text, default markdownparse_mode(optional): auto, ocr, or layout, default auto
Supports: PDF, images up to 50MB or 100 pages
📊 Usage Strategy
Execution Priority (follow this order)
When Claude faces a task, it should follow this priority chain:
| Priority | Action | When |
|---|---|---|
| 1 (FIRST) | Spawn parallel sub-agents | Multi-part tasks with independent pieces. Each sub-agent uses GLM-5 for heavy work |
| 2 (SECOND) | Delegate to GLM-5 directly | Single-unit tasks that can't be parallelized (>50 lines code → ask_glm5_pro, >300 words → ask_glm5) |
| 3 (LAST RESORT) | Claude does it itself | Only orchestration, file I/O, client polish, responses <100 words |
Sub-Agent Enforcement
Task sub-agents spawned by Claude MUST also use GLM-5 for their heavy work. When spawning a sub-agent, explicitly instruct it to use ask_glm5_pro for code generation and ask_glm5 for analysis/docs. Sub-agents that generate >50 lines of code or >300 words of content themselves (without delegating to GLM-5) are violating the delegation model.
Pattern: GLM-5 generates → Claude/sub-agent writes to disk.
Orchestration Model
Claude / Opus (Parent)
├── Planning & coordination (stays in Claude)
├── File operations & disk I/O (stays in Claude)
├── Quick responses <100 words (stays in Claude)
│
├── PRIORITY 1: Spawn parallel sub-agents (for multi-part tasks)
│ └── Each sub-agent uses GLM-5 for code/analysis
│ ├── ask_glm5_pro for code generation
│ ├── ask_glm5 for analysis/docs
│ └── Sub-agent writes output to disk
│
└── PRIORITY 2: Delegate to GLM-5 directly (for single-unit tasks)
├── Analysis >300 words
├── Code generation >50 lines
├── Research synthesis from multiple sources
└── Document processing & OCR
Model-Specific Strategies
Sonnet 4.5 (default for most tasks):
- Orchestration, file ops, disk I/O, quick responses
- Delegate all analysis >300 words and code >50 lines to GLM-5
- Result: 10x token reduction
Opus 4.6 (complex multi-step coordination):
- Priority 1: Spawn parallel sub-agents (each uses GLM-5)
- Priority 2: Delegate to GLM-5 directly
- Priority 3 (last resort): Opus does it itself
- Pattern: Opus = Architect, Sub-agents = Parallel workers, GLM-5 = Content engine
- Result: 90%+ reduction in Opus consumption
Example Workflows
Research pipeline:
- Start session with Sonnet 4.5
- Use
web_searchto find sources - Use
web_readerto fetch content (parallel) - Use
ask_glm5to analyze and synthesize - Sonnet formats and presents results
Multi-part task (Opus):
- Opus analyzes task and identifies independent parts
- Spawns parallel sub-agents for each part
- Each sub-agent calls
ask_glm5orask_glm5_pro - Each sub-agent writes output to disk
- Opus integrates results
Expected Savings
Before GLM-5 MCP:
- Weekly limit in 2 days
- 5 days blocked per week
- $100/month for 28% availability
After GLM-5 MCP:
- Sonnet 4.5: 10x less quota usage
- GLM-5: Handle heavy lifting
- 100% continuous availability
- $140-160/month total cost
ROI: 18-30x improvement in effective cost per hour of usage
🔧 Development
Project Structure
glm5-mcp/
├── index.js # Main MCP server
├── package.json # Dependencies
├── README.md # This file
├── LICENSE # MIT License
└── .gitignore # Git ignore rules
Testing
# Test the MCP server
npm start
The server will start in stdio mode and log to stderr. Use Claude Desktop to test the tools.
Adding New Tools
- Add tool definition to
ListToolsRequestSchemahandler - Add tool handler in
CallToolRequestSchemahandler - Update README documentation
- Test in Claude Desktop
📖 Documentation
For Users
- README.md - Setup and installation guide (you're here!)
- EXAMPLES.md - Real-world usage examples
- CONTRIBUTING.md - How to contribute
For Claude
- CLAUDE.md - Comprehensive project memory and delegation guidelines
- When to use each tool (Sonnet 4.5 vs Opus 4.6 strategies)
- Code quality standards and patterns
- Optimization rules to reduce Claude consumption
- Development workflow and troubleshooting
API Reference
This MCP server uses the Z.ai API. Key endpoints:
/paas/v4/chat/completions- GLM-5 text generation/paas/v4/web_search- Web search/paas/v4/reader- Web content fetching
See Z.ai Documentation for complete API reference.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Guidelines
- Follow existing code style
- Add tests for new features
- Update documentation
- Keep tools focused and single-purpose
Ideas for Contributions
- Add translation agent support
- Add slide generation (GLM Slide Agent)
- Add streaming support for real-time responses
- Add error retry logic
- Add caching for repeated queries
- Add usage tracking and analytics
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
🔗 Links
💡 Use Cases
Competitive Intelligence
1. web_search("competitor X new features 2024")
2. web_reader(top_results)
3. ask_glm5("Analyze competitor strategy and our response")
Research Synthesis
1. web_search("market trends AI agents", count=50)
2. Multiple web_reader() calls
3. ask_glm5("Synthesize findings into executive summary")
Document Analysis
1. parse_document("https://example.com/contract.pdf")
2. ask_glm5("Extract key terms, risks, and obligations")
Code Generation
1. ask_glm5_pro("Build a React component for user authentication with OAuth")
2. Sonnet integrates into codebase
🆘 Troubleshooting
Tools not appearing in Claude Desktop
- Check config file path is correct
- Verify absolute path to
index.js - Restart Claude Desktop completely (quit, don't just close window)
- Check Claude Desktop logs for errors
API errors
- Verify Z.ai API key is valid
- Check API key has sufficient credits
- Ensure network connectivity
- Check Z.ai service status
Empty responses
- GLM-5 may be rate-limited
- Try lowering
max_tokens - Check error logs in terminal
- Verify prompt is clear and specific
📈 Roadmap
- Add translation agent (40+ languages)
- Add slide generation
- Add image generation
- Add audio transcription
- Add streaming support
- Add response caching
- Add usage analytics
- Add configuration UI
- Add preset prompt templates
Made with ❤️ to help Claude users do more without limits