🧠 MiniMind Docker

All-in-One Docker deployment for MiniMind LLM with UI, API & MCP support

✨ Features

🐳 One-Click Docker Deployment - All dependencies bundled, ready to run
🎨 Modern Web UI - Responsive design with dark mode & multi-language support
🔌 OpenAI-Compatible API - Drop-in replacement for existing applications
🤖 MCP Integration - Model Context Protocol for AI agent workflows
🎮 Smart GPU Management - Auto-select idle GPU, auto-release memory
📊 Real-time Streaming - SSE-based streaming responses
🌍 Multi-language UI - English, 简体中文, 繁體中文, 日本語

🚀 Quick Start

Docker (Recommended)

# Pull and run
docker run -d --gpus all -p 8998:8998 neosun/minimind:latest

# Access
# UI: http://localhost:8998
# API: http://localhost:8998/v1/chat/completions
# Docs: http://localhost:8998/apidocs/

Docker Compose

git clone https://github.com/neosu/minimind-docker.git
cd minimind-docker
./start.sh

📦 Installation

Prerequisites

Docker 20.10+
Docker Compose 2.0+
NVIDIA GPU with CUDA 12.1+ (optional, CPU fallback available)
nvidia-container-toolkit (for GPU support)

Method 1: Docker Run

# Basic (CPU)
docker run -d -p 8998:8998 neosun/minimind:latest

# With GPU
docker run -d --gpus all -p 8998:8998 neosun/minimind:latest

# With custom model path
docker run -d --gpus all -p 8998:8998 \
  -v /path/to/models:/app/models \
  -e MODEL_PATH=/app/models/MiniMind2 \
  neosun/minimind:latest

Method 2: Docker Compose

# docker-compose.yml
services:
  minimind:
    image: neosun/minimind:latest
    ports:
      - "8998:8998"
    environment:
      - NVIDIA_VISIBLE_DEVICES=0
      - GPU_IDLE_TIMEOUT=60
      - MODEL_PATH=MiniMind2-Small
    volumes:
      - /tmp/minimind:/app/uploads
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

docker compose up -d

Method 3: Local Development

# Clone
git clone https://github.com/neosu/minimind-docker.git
cd minimind-docker

# Install dependencies
pip install -r requirements.txt

# Download model
python -c "from huggingface_hub import snapshot_download; snapshot_download('jingyaogong/MiniMind2-Small', local_dir='MiniMind2-Small')"

# Run
python app.py

⚙️ Configuration

Environment Variables

Variable	Default	Description
`PORT`	`8998`	Server port
`MODEL_PATH`	`MiniMind2-Small`	Model path or HuggingFace ID
`GPU_IDLE_TIMEOUT`	`60`	Seconds before auto-releasing GPU memory
`NVIDIA_VISIBLE_DEVICES`	`0`	GPU device ID
`MAX_SEQ_LEN`	`8192`	Maximum sequence length
`TEMPERATURE`	`0.85`	Default generation temperature

.env Example

PORT=8998
GPU_IDLE_TIMEOUT=60
NVIDIA_VISIBLE_DEVICES=0
MODEL_PATH=MiniMind2-Small

📖 Usage

Web UI

Visit http://localhost:8998 for the interactive chat interface.

Features:

Adjustable parameters (Temperature, Max Tokens, Top P)
GPU status monitoring
One-click GPU memory release
Multi-language support (EN/CN/TW/JP)
Dark mode support

REST API

Chat Completion (OpenAI Compatible)

curl -X POST http://localhost:8998/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimind",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7,
    "max_tokens": 512,
    "stream": false
  }'

Streaming Response

curl -X POST http://localhost:8998/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimind",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

GPU Status

# Check status
curl http://localhost:8998/api/gpu/status

# Release GPU memory
curl -X POST http://localhost:8998/api/gpu/offload

MCP Integration

Configure in your MCP client:

{
  "mcpServers": {
    "minimind": {
      "command": "python",
      "args": ["mcp_server.py"],
      "env": {
        "MODEL_PATH": "MiniMind2-Small",
        "GPU_IDLE_TIMEOUT": "600"
      }
    }
  }
}

Available Tools:

chat - Single-turn conversation
multi_turn_chat - Multi-turn conversation
get_gpu_status - Query GPU status
get_model_info - Get model information
release_gpu - Release GPU memory

See MCP_GUIDE.md for detailed documentation.

🔌 API Reference

Endpoint	Method	Description
`/`	GET	Web UI
`/health`	GET	Health check
`/api/gpu/status`	GET	GPU status
`/api/gpu/offload`	POST	Release GPU memory
`/v1/chat/completions`	POST	Chat API (OpenAI compatible)
`/apidocs/`	GET	Swagger documentation

📁 Project Structure

minimind-docker/
├── app.py              # Main application (UI + API)
├── mcp_server.py       # MCP server
├── Dockerfile          # Docker build file
├── docker-compose.yml  # Docker Compose config
├── start.sh           # One-click start script
├── requirements.txt    # Python dependencies
├── .env.example       # Environment template
├── MCP_GUIDE.md       # MCP documentation
├── model/             # Tokenizer files
├── trainer/           # Training scripts
└── scripts/           # Utility scripts

🛠️ Tech Stack

Framework: Flask + FastMCP
Model: MiniMind2 (Transformer-based LLM)
GPU: CUDA 12.1 + PyTorch 2.6
Container: Docker + nvidia-container-toolkit
API: OpenAI-compatible REST API
Docs: Swagger/Flasgger

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 Changelog

v1.0.0 (2026-01-04)

🎉 Initial release
🐳 Docker all-in-one deployment
🎨 Web UI with multi-language support
🔌 OpenAI-compatible API
🤖 MCP integration
🎮 Smart GPU management

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Based on MiniMind by Jingyao Gong.

⭐ Star History

📱 Follow Us

WeChat

MiniMind Docker