MCP Image Recognition Server (Python)

An MCP server implementation in Python providing image recognition capabilities using various LLM providers (Gemini, OpenAI, Qwen/Tongyi, Doubao, etc.).

Features

Image Recognition: Describe images or answer questions about them.
Multi-Model Support: Dynamically switch between Gemini, GPT-4o, Qwen-VL, Doubao, etc.
Flexible: Accepts image URLs or Base64 data.

Quick Setup (Recommended)

We provide automated scripts to set up the environment and dependencies in one click.

Linux / macOS

git clone https://github.com/glasses666/mcp-image-recognition-py.git
cd mcp-image-recognition-py
./setup.sh

Windows

Clone or download this repository.
Double-click setup.bat.

After the script finishes, simply edit the .env file with your API keys.

Installation & Usage (Manual)

If you prefer manual installation or want to use uv:

Prerequisites

Python 3.10 or higher
An API Key for your preferred model provider (Google Gemini, OpenAI, Aliyun DashScope, etc.)

Method 1: Using `uv` (Recommended)

uv is an extremely fast Python package manager.

1. Run directly with `uv run`

You don't need to manually create a virtual environment.

# Clone the repo
git clone https://github.com/glasses666/mcp-image-recognition-py.git
cd mcp-image-recognition-py

# Create .env file with your API keys
cp .env.example .env
# Edit .env with your keys

# Run the server
uv run server.py

2. Using `uvx` (for ephemeral execution)

If you want to run it without cloning the repo explicitly (experimental support via git):

# Note: You still need to provide environment variables. 
# It's easier to clone and use 'uv run' for persistent config via .env
uvx --from git+https://github.com/glasses666/mcp-image-recognition-py mcp-image-recognition

Method 2: Standard Python (pip)

Linux / macOS

Clone and Setup:

git clone https://github.com/glasses666/mcp-image-recognition-py.git
cd mcp-image-recognition-py
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configure:

cp .env.example .env
# Edit .env and add your API keys

Run:
```
python server.py
```

Windows

Clone and Setup:

git clone https://github.com/glasses666/mcp-image-recognition-py.git
cd mcp-image-recognition-py
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt

Configure:

copy .env.example .env
# Edit .env and add your API keys

Run:
```
python server.py
```

Configuration

Create a .env file in the project root based on .env.example:

1. For Google Gemini (Recommended for speed/cost)

Get an API key from Google AI Studio.

GEMINI_API_KEY=your_google_api_key
DEFAULT_MODEL=gemini-1.5-flash

2. For Tongyi Qianwen (Qwen - Alibaba Cloud)

Get an API key from Aliyun DashScope.

OPENAI_API_KEY=your_dashscope_api_key
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DEFAULT_MODEL=qwen-vl-max

3. For Doubao (Volcengine)

Get an API key from Volcengine Ark.

OPENAI_API_KEY=your_volcengine_api_key
OPENAI_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
DEFAULT_MODEL=doubao-pro-32k

Agent AI Configuration (Claude Desktop, etc.)

To use this server with an MCP client (like Claude Desktop), add it to your configuration file.

Configuration File Paths

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json (if available)

Configuration JSON

Option A: Using uv (Easiest) If you have uv installed, you can let it handle the environment.

{
  "mcpServers": {
    "image-recognition": {
      "command": "/path/to/uv",
      "args": [
        "run",
        "--directory",
        "/absolute/path/to/mcp-image-recognition-py",
        "server.py"
      ],
      "env": {
        "GEMINI_API_KEY": "your_gemini_key_here",
        "OPENAI_API_KEY": "your_openai_key_here",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "DEFAULT_MODEL": "gemini-1.5-flash"
      }
    }
  }
}

Option B: Standard Python Venv Ensure you provide the absolute path to the python executable in your virtual environment.

{
  "mcpServers": {
    "image-recognition": {
      "command": "/absolute/path/to/mcp-image-recognition-py/venv/bin/python", 
      "args": [
        "/absolute/path/to/mcp-image-recognition-py/server.py"
      ],
      "env": {
        "GEMINI_API_KEY": "your_gemini_key_here",
        "OPENAI_API_KEY": "your_openai_key_here",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "DEFAULT_MODEL": "gemini-1.5-flash"
      }
    }
  }
}

Windows Note: For paths, use double backslashes \\ (e.g., C:\\Users\\Name\\...).

Usage Tool

`recognize_image`

Analyzes an image and returns a text description.

Parameters:

image (string, required): The image to analyze. Supports:
- HTTP/HTTPS URLs (e.g., https://example.com/cat.jpg)
- Base64 encoded strings (with or without data:image/...;base64, prefix)
prompt (string, optional): Specific instruction. Default: "Describe this image".
model (string, optional): Override the default model for this specific request.

License

MIT

MCP Image Recognition Server

MCP Image Recognition Server (Python)

Features

Quick Setup (Recommended)

Linux / macOS

Windows

Installation & Usage (Manual)

Prerequisites

Method 1: Using uv (Recommended)

1. Run directly with uv run

2. Using uvx (for ephemeral execution)

Method 2: Standard Python (pip)

Linux / macOS

Windows

Configuration

1. For Google Gemini (Recommended for speed/cost)

2. For Tongyi Qianwen (Qwen - Alibaba Cloud)

3. For Doubao (Volcengine)

Agent AI Configuration (Claude Desktop, etc.)

Configuration File Paths

Configuration JSON

Usage Tool

recognize_image

License

Reviews

Method 1: Using `uv` (Recommended)

1. Run directly with `uv run`

2. Using `uvx` (for ephemeral execution)

`recognize_image`