MCP Hub
Back to servers

video-toolkit-mcp

A Model Context Protocol (MCP) server that provides comprehensive video tools: transcript retrieval, video downloading, and automatic subtitle generation using AI speech-to-text. Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp.

glama
Forks
1
Updated
Mar 14, 2026

Video Toolkit MCP Server

glama

A Model Context Protocol (MCP) server that provides comprehensive video tools: transcript retrieval, video downloading, and automatic subtitle generation using AI speech-to-text. Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp.

Features

  • Multi-Platform Support: Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp
  • Video Transcripts: Extract existing transcripts/captions from videos
  • Video Downloads: Download videos to local storage in various formats and qualities
  • Auto Subtitle Generation: Generate subtitles using OpenAI Whisper API or local Whisper
  • Multiple URL Formats: Support for various URL formats from different platforms
  • Timestamp Support: Include or exclude timestamps in transcript output
  • Language Selection: Request transcripts or generate subtitles in specific languages

Tools

ToolDescription
get-transcriptRetrieve existing transcripts from video platforms
list-transcript-languagesList available transcript languages for a video
download-videoDownload videos to local storage
list-downloadsList downloaded video files
generate-subtitlesGenerate subtitles using AI speech-to-text

Prerequisites

  • Node.js >= 16.0.0
  • yt-dlp - Required for transcript fetching and video downloads
  • ffmpeg - Required for subtitle generation (audio extraction)

Installing Dependencies

yt-dlp (required):

# Using Homebrew (macOS)
brew install yt-dlp

# Using pip
pip install yt-dlp

ffmpeg (required for subtitle generation):

# Using Homebrew (macOS)
brew install ffmpeg

# Using apt (Ubuntu/Debian)
sudo apt install ffmpeg

Local Whisper (optional, for local subtitle generation):

pip install openai-whisper

Installation

From Source

git clone <repository-url>
cd video-toolkit-mcp
npm install
npm run build

Global Installation (after publishing)

npm install -g video-toolkit-mcp

Configuration

For Claude Desktop / Cursor

Add the MCP server to your configuration file:

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "video-toolkit-mcp": {
      "command": "node",
      "args": ["/path/to/video-toolkit-mcp/dist/index.js"],
      "env": {
        "VIDEO_TOOLKIT_STORAGE_DIR": "/path/to/downloads",
        "OPENAI_API_KEY": "your-openai-api-key"
      }
    }
  }
}

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "video-toolkit-mcp": {
      "command": "node",
      "args": ["/path/to/video-toolkit-mcp/dist/index.js"],
      "env": {
        "VIDEO_TOOLKIT_STORAGE_DIR": "/path/to/downloads",
        "OPENAI_API_KEY": "your-openai-api-key"
      }
    }
  }
}

Environment Variables

VariableDescriptionDefault
VIDEO_TOOLKIT_STORAGE_DIRDefault directory for downloaded videos~/.video-toolkit/downloads
OPENAI_API_KEYOpenAI API key for Whisper-based subtitle generationNone
VIDEO_TOOLKIT_WHISPER_ENGINEPreferred whisper engine: openai, local, or autoauto
WHISPER_BINARY_PATHPath to local whisper binarywhisper
WHISPER_MODEL_PATHPath to whisper model (for local whisper)Auto-download
YT_DLP_PATHPath to yt-dlp binaryyt-dlp
FFMPEG_PATHPath to ffmpeg binaryffmpeg
DEBUGEnable debug logging0

Usage

1. get-transcript

Retrieve existing transcripts from video platforms.

Parameters:

  • url (required): Video URL
  • lang (optional): Language code (e.g., 'en', 'es', 'zh')
  • include_timestamps (optional): Include timestamps (default: true)

Example:

Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID

2. list-transcript-languages

List available transcript languages for a video.

Parameters:

  • url (required): Video URL

Example:

What transcript languages are available for https://www.youtube.com/watch?v=VIDEO_ID?

3. download-video

Download a video to local storage.

Parameters:

  • url (required): Video URL to download
  • output_dir (optional): Custom output directory
  • filename (optional): Custom filename
  • format (optional): Video format - mp4, webm, mkv (default: mp4)
  • quality (optional): Quality - best, 1080p, 720p, 480p, 360p, audio (default: best)

Example:

Download this video: https://www.youtube.com/watch?v=VIDEO_ID

4. list-downloads

List all downloaded video files.

Parameters:

  • directory (optional): Directory to list (default: storage directory)

Example:

List my downloaded videos

5. generate-subtitles

Generate subtitles for a local video file using AI speech-to-text.

Parameters:

  • video_path (required): Absolute path to the video file
  • engine (optional): openai or local (default: auto-detect)
  • language (optional): Language code for transcription
  • output_format (optional): srt or vtt (default: srt)

Example:

Generate subtitles for /path/to/video.mp4

Subtitle Generation Engines

OpenAI Whisper API

  • Pros: High accuracy, no local setup needed, supports 50+ languages
  • Cons: Requires API key, costs per audio minute
  • Setup: Set OPENAI_API_KEY environment variable

Local Whisper

  • Pros: Free, runs locally, no API limits
  • Cons: Requires setup, uses local CPU/GPU
  • Setup: pip install openai-whisper

The tool auto-detects which engine to use:

  1. If OPENAI_API_KEY is set, uses OpenAI Whisper
  2. If local whisper is installed, uses local whisper
  3. Returns an error if neither is available

Example Workflows

Download and Generate Subtitles

1. Download this video: https://www.youtube.com/watch?v=VIDEO_ID
2. Generate subtitles for the downloaded file

Summarize a Video

Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID and summarize the key points

Create Captions for Videos Without Subtitles

1. Download the video: https://vimeo.com/123456789
2. Generate English subtitles for it

Supported Platforms

Any platform supported by yt-dlp, including:

  • YouTube
  • Bilibili
  • Vimeo
  • Twitter/X
  • TikTok
  • Twitch
  • And many more...

Full list: https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md

Project Structure

video-toolkit-mcp/
├── src/
│   ├── index.ts              # Main MCP server entry point
│   ├── transcript-fetcher.ts # Transcript fetching using yt-dlp
│   ├── video-downloader.ts   # Video download functionality
│   ├── subtitle-generator.ts # AI-powered subtitle generation
│   ├── config.ts             # Configuration management
│   ├── url-detector.ts       # Platform detection from URLs
│   ├── parser.ts             # Transcript parsing (SRT, VTT, JSON)
│   └── errors.ts             # Custom error classes
├── test/
│   └── transcript.test.ts    # Unit tests
├── dist/                     # Compiled JavaScript (after build)
└── package.json

Development

# Build
npm run build

# Test
npm test

# Development mode
npm run dev

Troubleshooting

"yt-dlp is not installed"

brew install yt-dlp
# or
pip install yt-dlp

"ffmpeg is not installed"

brew install ffmpeg

"No Whisper engine available"

Either:

  • Set OPENAI_API_KEY environment variable, or
  • Install local whisper: pip install openai-whisper

Download issues

  • Check if the video is publicly accessible
  • Some platforms may have rate limits
  • Private/restricted videos cannot be downloaded

Subtitle generation is slow

  • OpenAI Whisper API is faster than local
  • Local whisper performance depends on your hardware
  • Consider using a smaller model for local whisper

License

MIT

Acknowledgments

Reviews

No reviews yet

Sign in to write a review