🎵 Gemini Audio MCP
Gemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.
🚀 Mission Statement
Our mission is to provide an immersive, AI-powered audio generation layer for any MCP-compatible environment, enabling the creation of dynamic, seamless, and high-quality environmental audio through simple text prompts.
✨ Key Features
- 🌊 Dynamic Soundscapes: Generate complex environmental audio using the latest Gemini 2.5 Native Audio models.
- 🎵 Professional Music: High-fidelity music production via Google's Lyria 3 models:
- Lyria 3 Pro: Full song generation with structural coherence ($0.08/req).
- Lyria 3 Clip: Low-latency clips and rhythmic loops ($0.04/req).
- 🔁 Infinite Looping: Seamless, click-free looping with 100ms micro-crossfades.
- 🔀 Smooth Crossfades: Transition between two different soundscapes with customizable crossfade durations.
- 📂 Universal Formats: Export audio to a variety of formats (WAV, MP3, OGG, FLAC) powered by FFmpeg.
- ▶️ Auto-play Integration: Instantly play generated audio through your system's default player upon completion.
- ⚙️ Persistent Configuration: Fine-tune default bitrates, sample rates, and durations once and reuse them across sessions.
🛠 Installation Guide
Prerequisites
- FFmpeg: Required for audio conversion and processing.
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt install ffmpeg - Windows: Download from ffmpeg.org.
- macOS:
- Rust Toolchain: Required for building the project (
cargo). - Gemini API Key: Obtain your key from the Google AI Studio.
1. NPM / NPX (Recommended for non-Rust users)
Add the server directly to your MCP client configuration using npx:
{
"mcpServers": {
"gemini-audio": {
"command": "npx",
"args": ["-y", "gemini-audio-mcp"],
"env": {
"GEMINI_API_KEY": "YOUR_API_KEY"
}
}
}
}
2. Manual Installation (Rust)
- Clone the repository:
git clone https://github.com/mcp-servers/gemini-audio-mcp.git cd gemini-audio-mcp - Build the project:
cargo build --release - Configure your environment:
Set the
GEMINI_API_KEYenvironment variable in your MCP client or system.
3. Docker (Cloud / Self-hosted)
The server is available as a Docker image for easy deployment:
docker run -it \
-e GEMINI_API_KEY="YOUR_API_KEY" \
-v gemini-audio-data:/root/.local/share/gemini-audio-mcp \
ghcr.io/jxoesneon/gemini-audio-mcp:latest
To use it in your MCP client configuration:
{
"mcpServers": {
"gemini-audio-docker": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"-e", "GEMINI_API_KEY=YOUR_API_KEY",
"ghcr.io/jxoesneon/gemini-audio-mcp:latest"
]
}
}
}
🎮 Use Cases for Game Developers & Creators
Gemini Audio MCP is designed to integrate seamlessly into modern creative workflows, particularly for those using Unreal Engine 5, Godot, or Blender:
- 🎲 Procedural Soundscapes: Generate unique, non-repeating environmental audio for open-world games or dynamic levels.
- 🗣️ Dynamic Character Dialogue: Use
generate_voicewith expressive direction to prototype character lines or create infinite NPC dialogue for RPGs. - 🎥 Automated Sound Design: Perfect for Blender artists looking to generate high-quality foley and background textures for animations directly through an AI-assisted pipeline.
- ⚡ Rapid Prototyping: Instantly generate rhythmic loops and musical stings for game jams or early-stage development.
🔧 Tool Usage Examples
Generate a Soundscape
Create an immersive 30-second loop of a cyberpunk rainy city.
{
"name": "generate_soundscape",
"arguments": {
"prompt": "Heavy rain on neon-lit cyberpunk city streets, distant hover-car hums, muffled holographic advertisements.",
"duration": 30,
"format": "mp3",
"auto_play": true
}
}
Transition Between Environments
Seamlessly shift from a peaceful forest to a roaring thunderstorm.
{
"name": "transition_soundscape",
"arguments": {
"from_prompt": "Quiet morning forest with chirping birds and rustling leaves.",
"to_prompt": "Intense tropical thunderstorm with loud thunder claps and heavy downpour.",
"transition_duration": 10,
"auto_play": true
}
}
Update Server Defaults
Set the default output format to FLAC for higher quality.
{
"name": "configure",
"arguments": {
"default_format": "flac",
"default_sample_rate": 48000
}
}
🏛 Architecture Overview
The server is built with a modular Rust architecture designed for efficiency and reliability:
main.rs: The core MCP protocol engine handling tool registration and request dispatching.gemini.rs: Manages low-level WebSocket communication with the Gemini 2.0 Multimodal Live API.audio.rs: Handles PCM data manipulation, including seamless looping algorithms and FFmpeg integration for format transcoding.mixer.rs: Implements audio processing logic for crossfading and blending multiple audio streams.config.rs: Provides a persistent JSON-based configuration layer for user preferences.
📄 License
Distributed under the MIT License. See LICENSE for more information.