CV MCP Tools

A collection of Model Context Protocol (MCP) servers and services that integrate specialized computer vision capabilities with language models. This repository demonstrates how to build modular CV tools that can be easily composed and orchestrated through MCP.

🔧 Components

MCP Servers

Object Detection MCP - YOLO-based object detection with MinIO integration
OCR + Image Generation MCP - Combined OCR and image generation with iterative validation workflows

Standalone Services

Image Generator Server - FLUX.1-schnell diffusion model service
OCR Server - Multi-model OCR service (Qwen-VL, Janus)

🚀 Quick Start

Prerequisites

Python 3.11+
UV package manager
Docker with GPU support
MinIO server (for MCP servers)

Running MCP Servers

# Object Detection
cd object_detection_mcp
uv run object_detector.py

# OCR + Image Generation  
cd ocr_imagen_mcp
uv run ocr_imagen.py

Running Standalone Services

# Image Generator
docker buildx build -t flux-schnell -f image_generator_server/Dockerfile .
docker run --gpus all -p 6070:6070 flux-schnell

# OCR Server
docker buildx build -t ocr-server -f ocr_server/Dockerfile .
docker run --gpus all -p 6080:6080 -p 6081:6081 ocr-server

🔗 Integration with Claude Desktop

Add to your Claude Desktop configuration:

{
    "mcpServers": {
        "object_detection": {
            "command": "uv",
            "args": ["--directory", "/path/to/object_detection_mcp", "run", "object_detector.py"],
            "env": {
                "YOLO_MODEL_NAME": "yolo11m.pt",
                "YOLO_CONF_THRESHOLD": "0.45",
                "MINIO_URL": "localhost:9000",
                "MINIO_ACCESS_KEY": "your-key",
                "MINIO_SECRET_KEY": "your-secret"
            }
        }
    }
}

📁 Repository Structure

cv-mcp-tools/
├── object_detection_mcp/     # YOLO object detection MCP server
├── ocr_imagen_mcp/          # Combined OCR + image generation MCP
├── image_generator_server/   # Standalone FLUX image generation service
├── ocr_server/              # Standalone OCR service
└── CLAUDE.md                # Development guide for Claude Code

🎯 Use Cases

Automated Content Analysis - Object detection and OCR for document processing
Iterative Image Generation - Generate images with text validation loops
Multi-Modal Workflows - Combine vision and language models for complex tasks
Modular CV Pipeline - Mix and match components as needed

📖 Documentation

Each component has its own README with detailed setup instructions:

Computer Vision Tools