MCP Video Recognition Server
An MCP (Model Context Protocol) server that provides tools for image, audio, and video recognition using Google's Gemini AI.
Features
- Image Recognition: Analyze and describe images using Google Gemini AI
- Audio Recognition: Analyze and transcribe audio using Google Gemini AI
- Video Recognition: Analyze and describe videos using Google Gemini AI
Prerequisites
- Node.js 18 or higher
- Google Gemini API key
Installation
Manual Installation
-
Clone the repository:
git clone https://github.com/yourusername/mcp-video-recognition.git cd mcp-video-recognition -
Install dependencies:
npm install -
Build the project:
npm run build
Installing in FLUJO
- Click Add Server
- Copy & Paste Github URL into FLUJO
- Click Parse, Clone, Install, Build and Save.
Installing via Configuration Files
To integrate this MCP server with Cline or other MCP clients via configuration files:
-
Open your Cline settings:
- In VS Code, go to File -> Preferences -> Settings
- Search for "Cline MCP Settings"
- Click "Edit in settings.json"
-
Add the server configuration to the
mcpServersobject:{ "mcpServers": { "video-recognition": { "command": "node", "args": [ "/path/to/mcp-video-recognition/dist/index.js" ], "disabled": false, "autoApprove": [] } } } -
Replace
/path/to/mcp-video-recognition/dist/index.jswith the actual path to theindex.jsfile in your project directory. Use forward slashes (/) or double backslashes (\\) for the path on Windows. -
Save the settings file. Cline should automatically connect to the server.
Configuration
The server is configured using environment variables:
GOOGLE_API_KEY(required): Your Google Gemini API keyTRANSPORT_TYPE: Transport type to use (stdioorsse, defaults tostdio)PORT: Port number for SSE transport (defaults to 3000)LOG_LEVEL: Logging level (verbose,debug,info,warn,error, defaults toinfo)
Usage
Starting the Server
With stdio Transport (Default)
GOOGLE_API_KEY=your_api_key npm start
With SSE Transport
GOOGLE_API_KEY=your_api_key TRANSPORT_TYPE=sse PORT=3000 npm start
Using the Tools
The server provides three tools that can be called by MCP clients:
Image Recognition
{
"name": "image_recognition",
"arguments": {
"filepath": "/path/to/image.jpg",
"prompt": "Describe this image in detail",
"modelname": "gemini-2.0-flash"
}
}
Audio Recognition
{
"name": "audio_recognition",
"arguments": {
"filepath": "/path/to/audio.mp3",
"prompt": "Transcribe this audio",
"modelname": "gemini-2.0-flash"
}
}
Video Recognition
{
"name": "video_recognition",
"arguments": {
"filepath": "/path/to/video.mp4",
"prompt": "Describe what happens in this video",
"modelname": "gemini-2.0-flash"
}
}
Tool Parameters
All tools accept the following parameters:
filepath(required): Path to the media file to analyzeprompt(optional): Custom prompt for the recognition (defaults to "Describe this content")modelname(optional): Gemini model to use for recognition (defaults to "gemini-2.0-flash")
Development
Running in Development Mode
GOOGLE_API_KEY=your_api_key npm run dev
Project Structure
src/index.ts: Entry pointsrc/server.ts: MCP server implementationsrc/tools/: Tool implementationssrc/services/: Service implementations (Gemini API)src/types/: Type definitionssrc/utils/: Utility functions
License
MIT