MCP Hub
Back to servers

whisper-telegram-mcp

An MCP server that enables transcribing local audio files and Telegram voice messages using OpenAI's Whisper via local inference or cloud API. It supports multiple audio formats, automatic language detection, and optional word-level timestamps for AI-powered audio analysis.

glama
Updated
Mar 30, 2026

whisper-telegram-mcp

Transcribe Telegram voice messages with Whisper -- as an MCP tool for Claude

CI Python Version License: MIT MCP

An MCP server that transcribes audio files and Telegram voice messages using OpenAI's Whisper speech recognition. Works with Claude Desktop, Claude Code, and any MCP-compatible client.

What It Does

  • Transcribe local audio files -- OGG, WAV, MP3, FLAC, and more
  • Transcribe Telegram voice messages -- pass a file_id, get text back
  • Two backends -- local inference with faster-whisper (free, private) or OpenAI Whisper API (cloud)
  • Auto mode -- tries local first, falls back to OpenAI if it fails
  • Language detection -- automatic or specify an ISO-639-1 code
  • Word-level timestamps -- optional fine-grained timing

Quick Start

One command with uvx

uvx whisper-telegram-mcp

No installation needed -- uvx handles everything.

Or install with pip

pip install whisper-telegram-mcp
whisper-telegram-mcp

Integration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "whisper-telegram-mcp": {
      "command": "uvx",
      "args": ["whisper-telegram-mcp"],
      "env": {
        "WHISPER_MODEL": "base",
        "WHISPER_BACKEND": "auto"
      }
    }
  }
}

Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "whisper-telegram-mcp": {
      "command": "uvx",
      "args": ["whisper-telegram-mcp"],
      "env": {
        "WHISPER_MODEL": "base",
        "WHISPER_BACKEND": "auto"
      }
    }
  }
}

Tools

ToolDescription
transcribe_audioTranscribe a local audio file (OGG, WAV, MP3, etc.) to text
transcribe_telegram_voiceDownload and transcribe a Telegram voice message by file_id
list_modelsList available Whisper model sizes with speed/accuracy info
check_backendsCheck which backends (local/OpenAI) are available and configured

transcribe_audio

file_path: str        # Absolute path to audio file
language: str | None  # ISO-639-1 code (e.g. "en"), None = auto-detect
word_timestamps: bool # Include word-level timestamps (default: false)

transcribe_telegram_voice

file_id: str          # Telegram voice message file_id
bot_token: str | None # Bot token (falls back to TELEGRAM_BOT_TOKEN env var)
language: str | None  # ISO-639-1 code, None = auto-detect
word_timestamps: bool # Include word-level timestamps (default: false)

Response Format

All transcription tools return:

{
  "text": "Hello, this is a voice message.",
  "language": "en",
  "language_probability": 0.98,
  "duration": 3.5,
  "segments": [
    {"start": 0.0, "end": 3.5, "text": "Hello, this is a voice message."}
  ],
  "backend": "local",
  "success": true,
  "error": null
}

Configuration

All configuration is via environment variables:

VariableDefaultDescription
WHISPER_BACKENDautoauto, local, or openai
WHISPER_MODELbaseWhisper model size (see below)
OPENAI_API_KEY--Required for openai backend
TELEGRAM_BOT_TOKEN--Required for transcribe_telegram_voice
WHISPER_LANGUAGEauto-detectISO-639-1 language code

How It Works

                         MCP Client (Claude)
                              |
                         [MCP stdio]
                              |
                    whisper-telegram-mcp
                         /         \
                        /           \
              transcribe_audio   transcribe_telegram_voice
                      |                    |
                      |            [Download via Bot API]
                      |                    |
                      +--------+-----------+
                               |
                         auto_transcribe()
                          /           \
                   LocalBackend    OpenAIBackend
                   (faster-whisper)  (Whisper API)
  1. Claude sends a tool call via MCP (stdio transport)
  2. For Telegram voice messages, the file is downloaded via Bot API
  3. auto_transcribe() picks the best available backend
  4. Transcription result is returned as structured JSON

Local vs OpenAI

Local (faster-whisper)OpenAI API
CostFree$0.006/min
PrivacyAll data stays on deviceAudio sent to OpenAI
Speed~1-10s depending on model~1-3s
SetupAutomatic (downloads model on first use)Requires OPENAI_API_KEY
AccuracyExcellent with base or largerExcellent
OfflineYesNo

Model Sizes

ModelParametersSpeedAccuracyVRAM
tiny39MFastestLowest~1GB
base74MFastGood~1GB
small244MModerateBetter~2GB
medium769MSlowHigh~5GB
large-v31550MSlowestHighest~10GB
turbo~800MFastHigh~6GB

English-only variants (tiny.en, base.en, small.en, medium.en) are slightly more accurate for English.

Development

git clone https://github.com/abid-mahdi/whisper-telegram-mcp.git
cd whisper-telegram-mcp
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run unit tests
pytest tests/ -v -m "not integration"

# Run integration tests (downloads ~150MB model on first run)
pytest tests/ -m integration -v

# Run with coverage
pytest tests/ --cov=src/whisper_telegram_mcp --cov-report=term-missing

MCP Inspector

uvx mcp dev src/whisper_telegram_mcp/server.py

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feat/amazing-feature)
  3. Run tests (pytest tests/ -v -m "not integration")
  4. Commit with conventional commits (feat:, fix:, docs:, etc.)
  5. Open a pull request

License

MIT

Reviews

No reviews yet

Sign in to write a review