Vision MCP Server
Free, unlimited vision capabilities for your AI coding assistant using Groq API and Meta Llama 4 Vision model.
Features
- Image Analysis - Understand and describe images
- Text Extraction (OCR) - Extract text from screenshots, documents, photos
- UI Analysis - Describe UI components, layouts, and design
- Error Diagnosis - Analyze error screenshots and suggest fixes
- Diagram Understanding - Interpret flowcharts, UML, architecture diagrams
- Chart Analysis - Read charts and dashboards for insights
- Image Comparison - Compare two images for differences
- Code Extraction - Extract code from IDE screenshots
Installation
Prerequisites
- Python 3.10 or higher
- Free Groq API key
Get Groq API Key (Free)
- Visit https://console.groq.com/keys
- Sign up (free)
- Create a new API key
Install Dependencies
cd vision-mcp-server
# Option 1: Using install script (recommended)
./install.sh
# Option 2: Manual installation
pip3 install mcp groq pillow aiofiles
Configuration
Claude Desktop
Add to ~/.claude/config.json:
{
"mcpServers": {
"vision-mcp-server": {
"command": "python",
"args": ["-m", "vision_mcp_server.server"],
"env": {
"GROQ_API_KEY": "your-groq-api-key-here"
}
}
}
}
OpenCode
Add to OpenCode settings:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"vision-mcp-server": {
"type": "local",
"command": ["python", "-m", "vision_mcp_server.server"],
"environment": {
"GROQ_API_KEY": "your-groq-api-key-here"
}
}
}
}
Cline (VS Code)
Add to Cline settings:
{
"mcpServers": {
"vision-mcp-server": {
"command": "python",
"args": ["-m", "vision_mcp_server.server"],
"env": {
"GROQ_API_KEY": "your-groq-api-key-here"
}
}
}
}
Usage
Analyze Image
Describe this image: screenshot.png
Extract Text
Extract text from this document: scan.jpg
Diagnose Error
What's wrong with this error screenshot: error.png
Understand Diagram
Explain this architecture diagram: system-diagram.png
Compare Images
Compare these two UI screenshots: old-ui.png vs new-ui.png
Available Tools
analyze_image- General image analysisextract_text- OCR text extractiondescribe_ui- UI component analysisdiagnose_error- Error screenshot analysisunderstand_diagram- Diagram interpretationanalyze_chart- Chart and dashboard analysiscompare_images- Image comparisoncode_from_screenshot- Code extraction from screenshots
Models Used
- meta-llama/llama-4-scout-17b-16e-instruct - Latest Meta Llama 4 vision model
- Available for free via Groq API
- No quotas, no limits
- Superior vision capabilities and multimodal performance
Testing
Run locally:
export GROQ_API_KEY=your-api-key
python -m vision_mcp_server.server
License
MIT