Gemini Image MCP Server
A Model Context Protocol (MCP) server for image generation and editing using Google Gemini AI. Supports optional context images to guide results and now includes a dedicated edit workflow. Optimized for creating eyeβcatching social media images with square (1:1) format by default.
Features
- β¨ Image generation with Google Gemini AI
- π¨ Multiple aspect ratios (1:1, 16:9, 9:16, 4:3, 3:4)
- π± Optimized for social media with 1:1 format by default
- π― Custom style support
- π§© Context images to guide generation
- βοΈ Dedicated edit tool for modifying existing assets without juggling extra options
- π·οΈ Watermark support - Overlay watermark images on generated results
- πΎ Automatic saving of images to local files
- π Flexible output path configuration
- π‘οΈ Customizable safety settings
Installation
- Clone this repository
- Install dependencies:
npm install
- Build the project:
npm run build
Configuration
Environment Variables
You need to configure your Google AI API key:
export GOOGLE_API_KEY="your-api-key-here"
Getting Google AI API Key
- Go to Google AI Studio
- Create a new API key
- Copy the key and set it as an environment variable
Client Configuration
{
"servers": {
"gemini-image": {
"command": "node",
"args": ["/full/path/to/project/dist/index.js"],
"env": {
"GOOGLE_API_KEY": "your-api-key-here"
}
}
}
}
Command Line Interface
In addition to the MCP server, the project now ships with a CLI for quick terminal-friendly workflows.
-
Build the project once:
npm run build -
Make sure
GOOGLE_API_KEYis set in your environment. -
Explore the CLI:
node dist/cli.js --help # or, after publishing/packing: gemini-image --help
Commands
gemini-image generate: Create new imagery from a text prompt.gemini-image generate --prompt "A banana astronaut on Mars" --output ./images/gemini-image edit: Apply instructions to an existing image.gemini-image edit --prompt "Add neon lights to the skyline" --input ./images/city.png
Both commands support --help for detailed, friendly option descriptions. CLI option names are intentionally concise (for example --prompt, --context, --input) so they are easier to memorize than the MCP tool identifiers.
Available Tools
generate_image
Creates a brand-new image from a text description, optionally using one or more images as visual context. Use this tool when you want to generate fresh content.
Parameters:
description(string, required): Detailed description of the desired image.images(string[], optional): Array of image paths used as context (absolute or relative). Use this to βeditβ or guide style/content.aspectRatio(string, optional): Orientation preset (square,landscape,portrait). Default:square.style(string, optional): Additional style (e.g., "minimalist", "colorful", "professional", "artistic").outputPath(string, optional): Where to save the image. If omitted, saves in current directory.watermarkPath(string, optional): Path to watermark image to overlay.watermarkPosition(string, optional): One oftop-left,top-right,bottom-left,bottom-right. Default:bottom-right.
Usage Examples:
# Basic - saves to current directory
Generate an image of a mountain landscape at sunset with warm, minimalist style
# With context image to guide composition
Generate an image: "Create a futuristic city skyline inspired by this photo", images: ["./reference-skyline.jpg"], aspectRatio: "landscape"
# Multiple context images
Generate an image combining style of a logo and a photo, images: ["./photo.jpg", "./logo.png"], style: "professional"
When you request a specific orientation (square, landscape, or portrait), the server automatically appends an invisible helper image (assets/square.png, assets/landscape.png, or assets/portrait.png) so Gemini respects the target dimensions.
edit_image
Modifies an existing image using a focused text instruction. This tool keeps the original framing unless you explicitly ask for structural changes.
Parameters:
description(string, required): Instructions describing the edits to apply to the provided image.image(string, required): Path to the image file you want to edit (absolute or relative).outputPath(string, optional): Where to save the edited result. If omitted, the server uses the working directory and an auto-generated filename.
Usage Examples:
# Simple edit
Edit image: "Soften skin tones and remove flyaway hairs", image: "./headshot.png"
# Heavier retouch
Edit image: "Turn the product label red and add subtle sparkle highlights", image: "./product-shot.jpg"
# Custom path and watermark (top-left)
Generate an image of a space cat, outputPath: "./images/epic_pizza.png", watermarkPath: "./my_logo.png", watermarkPosition: "top-left"
Watermark Functionality
The generate_image tool supports adding watermarks to your images:
Features:
- π·οΈ Add image watermarks to any generated output
- π Position in any corner (
watermarkPosition) - π Smart sizing (25% of image width, maintaining aspect ratio)
- π― Consistent spacing (3% padding from edges)
- πΌοΈ Supports PNG, JPG, WebP watermark files
- β‘ Only applied when
watermarkPathparameter is provided
Usage:
# For image generation
watermarkPath: "./my-brand-logo.png"
# With context images
watermarkPath: "./watermark.jpg"
Watermark Specifications:
- Position: Configurable corner via
watermarkPosition - Size: 25% of image width (maintains watermark aspect ratio)
- Padding: 3% of image width from the selected edges
- Blend mode: Over (watermark appears on top of image)
Save Functionality:
- Default: Images are saved in the directory from where the MCP client is executed
- Automatic naming: Generated based on description, date and time
- Supported formats: PNG, JPG, WebP (depending on what Gemini returns)
- Automatic creation: Creates necessary folders if they don't exist
Development
Available Scripts
npm run build: Compiles TypeScript to JavaScriptnpm run dev: Development mode with automatic reloadnpm start: Runs the compiled servernpm run cli: Runs the CLI entry directly (node dist/cli.js)
Project Structure
gemini-image-mcp-server/
βββ src/
β βββ index.ts # Main server entry point
β βββ cli.ts # CLI entry point (generate/edit commands)
β βββ services/
β β βββ gemini.ts # Gemini AI calls
β β βββ imageService.ts # File system + watermark handling
β β βββ serviceFactory.ts # Shared initialization helpers
β βββ tools/
β β βββ index.ts # Tools exports
β β βββ generateImage.ts # Tool for creating new images
β β βββ editImage.ts # Tool for editing existing images
β βββ types/
β βββ index.ts # Type definitions
βββ dist/ # Compiled files
βββ package.json
βββ tsconfig.json
βββ README.md
Troubleshooting
Error: "GOOGLE_API_KEY environment variable is required"
Make sure you have configured the GOOGLE_API_KEY environment variable with your Google AI API key.
Error: "Could not generate image"
- Verify that your API key is valid and has permissions for the
gemini-2.5-flash-image-previewmodel - Ensure the description doesn't contain content that might be blocked by safety filters
File saving error
- Verify you have write permissions in the specified path
- Make sure the path is valid and accessible
- If specifying a folder, end it with
/
Server not responding
- Verify the server is running correctly
- Check logs in stderr for error messages
- Make sure the MCP client is configured correctly
License
MIT
Contributing
Contributions are welcome. Please open an issue before making significant changes.