🧠 MiniMind Docker
All-in-One Docker deployment for MiniMind LLM with UI, API & MCP support
✨ Features
- 🐳 One-Click Docker Deployment - All dependencies bundled, ready to run
- 🎨 Modern Web UI - Responsive design with dark mode & multi-language support
- 🔌 OpenAI-Compatible API - Drop-in replacement for existing applications
- 🤖 MCP Integration - Model Context Protocol for AI agent workflows
- 🎮 Smart GPU Management - Auto-select idle GPU, auto-release memory
- 📊 Real-time Streaming - SSE-based streaming responses
- 🌍 Multi-language UI - English, 简体中文, 繁體中文, 日本語
🚀 Quick Start
Docker (Recommended)
# Pull and run
docker run -d --gpus all -p 8998:8998 neosun/minimind:latest
# Access
# UI: http://localhost:8998
# API: http://localhost:8998/v1/chat/completions
# Docs: http://localhost:8998/apidocs/
Docker Compose
git clone https://github.com/neosu/minimind-docker.git
cd minimind-docker
./start.sh
📦 Installation
Prerequisites
- Docker 20.10+
- Docker Compose 2.0+
- NVIDIA GPU with CUDA 12.1+ (optional, CPU fallback available)
- nvidia-container-toolkit (for GPU support)
Method 1: Docker Run
# Basic (CPU)
docker run -d -p 8998:8998 neosun/minimind:latest
# With GPU
docker run -d --gpus all -p 8998:8998 neosun/minimind:latest
# With custom model path
docker run -d --gpus all -p 8998:8998 \
-v /path/to/models:/app/models \
-e MODEL_PATH=/app/models/MiniMind2 \
neosun/minimind:latest
Method 2: Docker Compose
# docker-compose.yml
services:
minimind:
image: neosun/minimind:latest
ports:
- "8998:8998"
environment:
- NVIDIA_VISIBLE_DEVICES=0
- GPU_IDLE_TIMEOUT=60
- MODEL_PATH=MiniMind2-Small
volumes:
- /tmp/minimind:/app/uploads
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
docker compose up -d
Method 3: Local Development
# Clone
git clone https://github.com/neosu/minimind-docker.git
cd minimind-docker
# Install dependencies
pip install -r requirements.txt
# Download model
python -c "from huggingface_hub import snapshot_download; snapshot_download('jingyaogong/MiniMind2-Small', local_dir='MiniMind2-Small')"
# Run
python app.py
⚙️ Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT | 8998 | Server port |
MODEL_PATH | MiniMind2-Small | Model path or HuggingFace ID |
GPU_IDLE_TIMEOUT | 60 | Seconds before auto-releasing GPU memory |
NVIDIA_VISIBLE_DEVICES | 0 | GPU device ID |
MAX_SEQ_LEN | 8192 | Maximum sequence length |
TEMPERATURE | 0.85 | Default generation temperature |
.env Example
PORT=8998
GPU_IDLE_TIMEOUT=60
NVIDIA_VISIBLE_DEVICES=0
MODEL_PATH=MiniMind2-Small
📖 Usage
Web UI
Visit http://localhost:8998 for the interactive chat interface.
Features:
- Adjustable parameters (Temperature, Max Tokens, Top P)
- GPU status monitoring
- One-click GPU memory release
- Multi-language support (EN/CN/TW/JP)
- Dark mode support
REST API
Chat Completion (OpenAI Compatible)
curl -X POST http://localhost:8998/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "minimind",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7,
"max_tokens": 512,
"stream": false
}'
Streaming Response
curl -X POST http://localhost:8998/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "minimind",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
GPU Status
# Check status
curl http://localhost:8998/api/gpu/status
# Release GPU memory
curl -X POST http://localhost:8998/api/gpu/offload
MCP Integration
Configure in your MCP client:
{
"mcpServers": {
"minimind": {
"command": "python",
"args": ["mcp_server.py"],
"env": {
"MODEL_PATH": "MiniMind2-Small",
"GPU_IDLE_TIMEOUT": "600"
}
}
}
}
Available Tools:
chat- Single-turn conversationmulti_turn_chat- Multi-turn conversationget_gpu_status- Query GPU statusget_model_info- Get model informationrelease_gpu- Release GPU memory
See MCP_GUIDE.md for detailed documentation.
🔌 API Reference
| Endpoint | Method | Description |
|---|---|---|
/ | GET | Web UI |
/health | GET | Health check |
/api/gpu/status | GET | GPU status |
/api/gpu/offload | POST | Release GPU memory |
/v1/chat/completions | POST | Chat API (OpenAI compatible) |
/apidocs/ | GET | Swagger documentation |
📁 Project Structure
minimind-docker/
├── app.py # Main application (UI + API)
├── mcp_server.py # MCP server
├── Dockerfile # Docker build file
├── docker-compose.yml # Docker Compose config
├── start.sh # One-click start script
├── requirements.txt # Python dependencies
├── .env.example # Environment template
├── MCP_GUIDE.md # MCP documentation
├── model/ # Tokenizer files
├── trainer/ # Training scripts
└── scripts/ # Utility scripts
🛠️ Tech Stack
- Framework: Flask + FastMCP
- Model: MiniMind2 (Transformer-based LLM)
- GPU: CUDA 12.1 + PyTorch 2.6
- Container: Docker + nvidia-container-toolkit
- API: OpenAI-compatible REST API
- Docs: Swagger/Flasgger
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📝 Changelog
v1.0.0 (2026-01-04)
- 🎉 Initial release
- 🐳 Docker all-in-one deployment
- 🎨 Web UI with multi-language support
- 🔌 OpenAI-compatible API
- 🤖 MCP integration
- 🎮 Smart GPU management
📄 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Based on MiniMind by Jingyao Gong.
⭐ Star History
📱 Follow Us
