MCP Hub
Back to servers

MCP Image Recognition Server

A Python-based MCP server that provides image analysis and recognition using multiple vision models like Gemini, GPT-4o, and Qwen-VL via URLs or Base64 data.

Stars
1
Tools
1
Updated
Dec 3, 2025
Validated
Mar 5, 2026

MCP Image Recognition Server (Python)

An MCP server implementation in Python providing image recognition capabilities using various LLM providers (Gemini, OpenAI, Qwen/Tongyi, Doubao, etc.).

Features

  • Image Recognition: Describe images or answer questions about them.
  • Multi-Model Support: Dynamically switch between Gemini, GPT-4o, Qwen-VL, Doubao, etc.
  • Flexible: Accepts image URLs or Base64 data.

Quick Setup (Recommended)

We provide automated scripts to set up the environment and dependencies in one click.

Linux / macOS

git clone https://github.com/glasses666/mcp-image-recognition-py.git
cd mcp-image-recognition-py
./setup.sh

Windows

  1. Clone or download this repository.
  2. Double-click setup.bat.

After the script finishes, simply edit the .env file with your API keys.


Installation & Usage (Manual)

If you prefer manual installation or want to use uv:

Prerequisites

  • Python 3.10 or higher
  • An API Key for your preferred model provider (Google Gemini, OpenAI, Aliyun DashScope, etc.)

Method 1: Using uv (Recommended)

uv is an extremely fast Python package manager.

1. Run directly with uv run

You don't need to manually create a virtual environment.

# Clone the repo
git clone https://github.com/glasses666/mcp-image-recognition-py.git
cd mcp-image-recognition-py

# Create .env file with your API keys
cp .env.example .env
# Edit .env with your keys

# Run the server
uv run server.py

2. Using uvx (for ephemeral execution)

If you want to run it without cloning the repo explicitly (experimental support via git):

# Note: You still need to provide environment variables. 
# It's easier to clone and use 'uv run' for persistent config via .env
uvx --from git+https://github.com/glasses666/mcp-image-recognition-py mcp-image-recognition

Method 2: Standard Python (pip)

Linux / macOS

  1. Clone and Setup:

    git clone https://github.com/glasses666/mcp-image-recognition-py.git
    cd mcp-image-recognition-py
    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
  2. Configure:

    cp .env.example .env
    # Edit .env and add your API keys
    
  3. Run:

    python server.py
    

Windows

  1. Clone and Setup:

    git clone https://github.com/glasses666/mcp-image-recognition-py.git
    cd mcp-image-recognition-py
    python -m venv venv
    .\venv\Scripts\activate
    pip install -r requirements.txt
    
  2. Configure:

    copy .env.example .env
    # Edit .env and add your API keys
    
  3. Run:

    python server.py
    

Configuration

Create a .env file in the project root based on .env.example:

1. For Google Gemini (Recommended for speed/cost)

Get an API key from Google AI Studio.

GEMINI_API_KEY=your_google_api_key
DEFAULT_MODEL=gemini-1.5-flash

2. For Tongyi Qianwen (Qwen - Alibaba Cloud)

Get an API key from Aliyun DashScope.

OPENAI_API_KEY=your_dashscope_api_key
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DEFAULT_MODEL=qwen-vl-max

3. For Doubao (Volcengine)

Get an API key from Volcengine Ark.

OPENAI_API_KEY=your_volcengine_api_key
OPENAI_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
DEFAULT_MODEL=doubao-pro-32k

Agent AI Configuration (Claude Desktop, etc.)

To use this server with an MCP client (like Claude Desktop), add it to your configuration file.

Configuration File Paths

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json (if available)

Configuration JSON

Option A: Using uv (Easiest) If you have uv installed, you can let it handle the environment.

{
  "mcpServers": {
    "image-recognition": {
      "command": "/path/to/uv",
      "args": [
        "run",
        "--directory",
        "/absolute/path/to/mcp-image-recognition-py",
        "server.py"
      ],
      "env": {
        "GEMINI_API_KEY": "your_gemini_key_here",
        "OPENAI_API_KEY": "your_openai_key_here",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "DEFAULT_MODEL": "gemini-1.5-flash"
      }
    }
  }
}

Option B: Standard Python Venv Ensure you provide the absolute path to the python executable in your virtual environment.

{
  "mcpServers": {
    "image-recognition": {
      "command": "/absolute/path/to/mcp-image-recognition-py/venv/bin/python", 
      "args": [
        "/absolute/path/to/mcp-image-recognition-py/server.py"
      ],
      "env": {
        "GEMINI_API_KEY": "your_gemini_key_here",
        "OPENAI_API_KEY": "your_openai_key_here",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "DEFAULT_MODEL": "gemini-1.5-flash"
      }
    }
  }
}

Windows Note: For paths, use double backslashes \\ (e.g., C:\\Users\\Name\\...).


Usage Tool

recognize_image

Analyzes an image and returns a text description.

Parameters:

  • image (string, required): The image to analyze. Supports:
    • HTTP/HTTPS URLs (e.g., https://example.com/cat.jpg)
    • Base64 encoded strings (with or without data:image/...;base64, prefix)
  • prompt (string, optional): Specific instruction. Default: "Describe this image".
  • model (string, optional): Override the default model for this specific request.

License

MIT

Reviews

No reviews yet

Sign in to write a review