🎬 YTPipe - AI-Native YouTube Processing Pipeline

Transform YouTube videos into LLM-ready knowledge bases with a production-ready MCP backend.

Quick Start • Features • Documentation • MCP Tools

✨ Features

🤖 MCP Integration - 12 AI-callable tools for seamless agent integration
🎯 Smart Chunking - Semantic text chunking with timeline timestamps
🧠 Vector Embeddings - 384-dimensional embeddings for semantic search
🔍 Full-Text Search - Context-aware transcript search
📊 SEO Intelligence - AI-powered title, tag, and description optimization
⏱️ Timeline Analysis - Topic evolution and keyword density tracking
🏗️ Microservices - 11 independent, composable services
🔐 Type-Safe - Pydantic models throughout
⚡ Async-First - Non-blocking I/O operations
🗄️ Multi-Backend - ChromaDB, FAISS, Qdrant support

🚀 Quick Start

# Install
git clone https://github.com/leolech14/ytpipe.git
cd ytpipe
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Process a video
ytpipe "https://youtube.com/watch?v=dQw4w9WgXcQ"

Result: Metadata + Transcript + Semantic Chunks + Embeddings + Vector Storage

🎯 Usage Examples

MCP Server (AI Agents)

python -m ytpipe.mcp.server

Then from Claude Code:

"Process this video: https://youtube.com/watch?v=VIDEO_ID"
"Search video dQw4w9WgXcQ for 'machine learning'"
"Optimize SEO for video dQw4w9WgXcQ"

CLI (Humans)

# Basic
ytpipe "https://youtube.com/watch?v=VIDEO_ID"

# Advanced
ytpipe URL --backend faiss --whisper-model large --verbose

Python API (Developers)

from ytpipe.core.pipeline import Pipeline

pipeline = Pipeline(output_dir="./output")
result = await pipeline.process(url)

print(f"✅ {result.metadata.title}")
print(f"   Chunks: {len(result.chunks)}")
print(f"   Time: {result.processing_time:.1f}s")

📋 MCP Tools

Pipeline (4 tools)

ytpipe_process_video - Full pipeline
ytpipe_download - Download only
ytpipe_transcribe - Transcribe audio
ytpipe_embed - Generate embeddings

Query (4 tools)

ytpipe_search - Full-text search
ytpipe_find_similar - Semantic search
ytpipe_get_chunk - Get chunk by ID
ytpipe_get_metadata - Get video info

Analytics (4 tools)

ytpipe_seo_optimize - SEO recommendations
ytpipe_quality_report - Quality metrics
ytpipe_topic_timeline - Topic evolution
ytpipe_benchmark - Performance analysis

🏗️ Architecture

MCP Server (12 tools) → Pipeline Orchestrator → 11 Services → Pydantic Models

Services:

Extractors (2): Download, Transcriber
Processors (4): Chunker, Embedder, VectorStore, Docling
Intelligence (4): Search, SEO, Timeline, Analyzer
Exporters (1): Dashboard

8 Processing Phases:

Download → 2. Transcription → 3. Chunking → 4. Embeddings →
Export → 6. Dashboard → 7. Docling → 8. Vector Storage

📊 Performance

Metric	Value
Processing Speed	4-13x real-time
Memory Usage	<2GB peak
Chunk Quality	85%+ high quality
Embedding Dimension	384

🔧 Requirements

Python 3.8+
FFmpeg (for audio extraction)
4GB+ RAM recommended
GPU optional (CUDA for acceleration)

📖 Documentation

🤝 Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

📝 License

MIT License - see LICENSE for details.

🙏 Credits

Built with:

FastMCP - MCP server framework
OpenAI Whisper - Speech-to-text
sentence-transformers - Text embeddings
Model Context Protocol - AI tool standard

📧 Contact

Leonardo Lech

Email: leonardo.lech@gmail.com
GitHub: @leolech14

⭐ Star this repo if you find it useful!

Transform YouTube → Knowledge Base in seconds

YTPipe