Kyutai TTS Docker Deployment
Production-ready Docker deployment for Kyutai TTS with UI, REST API, and MCP support
✨ Features
- 🚀 One-Click Deployment - Automated GPU selection and port detection
- 🎨 Three Access Modes - Web UI, REST API, and MCP tools
- 🧠 Smart GPU Management - Lazy loading and automatic memory release
- 🌐 Multi-language UI - English and Chinese interface
- 📦 All-in-One Image - No external dependencies, models included
- 🔒 Production Ready - HTTPS, health checks, and monitoring
🚀 Quick Start
Using Docker Hub (Recommended)
docker run -d \
--name kyutai-tts \
--gpus all \
-p 8900:8900 \
-e NVIDIA_VISIBLE_DEVICES=0 \
neosun/kyutai-tts:allinone
Access at: http://localhost:8900
Using Docker Compose
git clone https://github.com/neosun100/kyutai-tts-docker.git
cd kyutai-tts-docker
./start.sh
📦 Installation
Prerequisites
- Docker 20.10+
- Docker Compose 2.0+
- NVIDIA GPU with CUDA support
- nvidia-docker runtime
Method 1: Pull from Docker Hub
docker pull neosun/kyutai-tts:allinone
Method 2: Build from Source
git clone https://github.com/neosun100/kyutai-tts-docker.git
cd kyutai-tts-docker
docker-compose build
⚙️ Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT | 8900 | Service port |
DEVICE | cuda | Device type (cuda/cpu) |
GPU_IDLE_TIMEOUT | 60 | GPU idle timeout (seconds) |
NVIDIA_VISIBLE_DEVICES | 0 | GPU ID to use |
Example .env File
PORT=8900
DEVICE=cuda
GPU_IDLE_TIMEOUT=60
NVIDIA_VISIBLE_DEVICES=0
📖 Usage
Web UI
- Open browser: http://localhost:8900
- Enter text to synthesize
- Adjust parameters (optional)
- Click "Generate"
- Play or download audio
REST API
Generate Speech
curl -X POST http://localhost:8900/api/tts \
-F "text=Hello, world!" \
-F "cfg_coef=2.0" \
--output output.wav
Check GPU Status
curl http://localhost:8900/api/gpu/status
Release GPU Memory
curl -X POST http://localhost:8900/api/gpu/offload
MCP Tools
See MCP_GUIDE.md for detailed MCP usage.
result = await mcp_client.call_tool(
"text_to_speech",
{
"text": "Hello from MCP!",
"output_path": "/tmp/output.wav"
}
)
🏗️ Project Structure
kyutai-tts-docker/
├── app.py # Flask application
├── gpu_manager.py # GPU resource manager
├── mcp_server.py # MCP server
├── Dockerfile # Docker image
├── Dockerfile.allinone # All-in-one image
├── docker-compose.yml # Docker Compose config
├── start.sh # One-click startup script
├── test_api.sh # API test script
└── docs/ # Documentation
├── QUICKSTART.md
├── MCP_GUIDE.md
└── TEST_REPORT.md
🛠️ Tech Stack
- Framework: Flask 3.0
- ML Framework: PyTorch 2.7 + CUDA 12.1
- TTS Model: Kyutai TTS 1.6B (English/French)
- API Docs: Swagger/Flasgger
- MCP: FastMCP 0.2
- Container: Docker + nvidia-docker
🔗 API Documentation
Once running, access Swagger docs at: http://localhost:8900/apidocs
Available Endpoints
GET /health- Health checkGET /api/gpu/status- GPU statusPOST /api/tts- Generate speechPOST /api/gpu/offload- Release GPU memory
🌐 Production Deployment
With Nginx Reverse Proxy
server {
listen 443 ssl;
server_name your-domain.com;
location / {
proxy_pass http://localhost:8900;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Multi-GPU Setup
# GPU 0
NVIDIA_VISIBLE_DEVICES=0 PORT=8900 docker-compose up -d
# GPU 1
NVIDIA_VISIBLE_DEVICES=1 PORT=8901 docker-compose up -d
📊 Performance
- Model Size: 1.6B parameters
- GPU Memory: 3-4GB
- Latency: 350ms (L40S, 32 concurrent)
- Speed: 3-5x real-time
- Audio Quality: 16-bit PCM, 24kHz
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📝 Changelog
v1.0.0 (2025-12-14)
- Initial release
- Docker deployment with GPU support
- Web UI with multi-language support
- REST API with Swagger docs
- MCP server implementation
- All-in-one Docker image
📄 License
- Python code: MIT License
- Rust code: Apache License
- Model weights: CC-BY 4.0
🙏 Acknowledgments
- Kyutai Labs for the TTS model
- Moshi for the implementation
⭐ Star History
📱 Follow Us
