MCP Hub
Back to servers

Computer Vision Tools

MCP (Model Context Protocol) Server. Provides computer vision capabilities including image generation, OCR text extraction, and object detection through containerized Docker services with MinIO integration for image storage and retrieval.

Stars
1
Validated
Jan 11, 2026

CV MCP Tools

A collection of Model Context Protocol (MCP) servers and services that integrate specialized computer vision capabilities with language models. This repository demonstrates how to build modular CV tools that can be easily composed and orchestrated through MCP.

🔧 Components

MCP Servers

Standalone Services

🚀 Quick Start

Prerequisites

Running MCP Servers

# Object Detection
cd object_detection_mcp
uv run object_detector.py

# OCR + Image Generation  
cd ocr_imagen_mcp
uv run ocr_imagen.py

Running Standalone Services

# Image Generator
docker buildx build -t flux-schnell -f image_generator_server/Dockerfile .
docker run --gpus all -p 6070:6070 flux-schnell

# OCR Server
docker buildx build -t ocr-server -f ocr_server/Dockerfile .
docker run --gpus all -p 6080:6080 -p 6081:6081 ocr-server

🔗 Integration with Claude Desktop

Add to your Claude Desktop configuration:

{
    "mcpServers": {
        "object_detection": {
            "command": "uv",
            "args": ["--directory", "/path/to/object_detection_mcp", "run", "object_detector.py"],
            "env": {
                "YOLO_MODEL_NAME": "yolo11m.pt",
                "YOLO_CONF_THRESHOLD": "0.45",
                "MINIO_URL": "localhost:9000",
                "MINIO_ACCESS_KEY": "your-key",
                "MINIO_SECRET_KEY": "your-secret"
            }
        }
    }
}

📁 Repository Structure

cv-mcp-tools/
├── object_detection_mcp/     # YOLO object detection MCP server
├── ocr_imagen_mcp/          # Combined OCR + image generation MCP
├── image_generator_server/   # Standalone FLUX image generation service
├── ocr_server/              # Standalone OCR service
└── CLAUDE.md                # Development guide for Claude Code

🎯 Use Cases

  • Automated Content Analysis - Object detection and OCR for document processing
  • Iterative Image Generation - Generate images with text validation loops
  • Multi-Modal Workflows - Combine vision and language models for complex tasks
  • Modular CV Pipeline - Mix and match components as needed

📖 Documentation

Each component has its own README with detailed setup instructions:

Reviews

No reviews yet

Sign in to write a review