Skill Seeker

Automatically convert documentation websites, GitHub repositories, and PDFs into Claude AI skills in minutes.

📋 View Development Roadmap & Tasks - 134 tasks across 10 categories, pick any to contribute!

What is Skill Seeker?

Skill Seeker is an automated tool that transforms documentation websites, GitHub repositories, and PDF files into production-ready Claude AI skills. Instead of manually reading and summarizing documentation, Skill Seeker:

Scrapes multiple sources (docs, GitHub repos, PDFs) automatically
Analyzes code repositories with deep AST parsing
Detects conflicts between documentation and code implementation
Organizes content into categorized reference files
Enhances with AI to extract best examples and key concepts
Packages everything into an uploadable .zip file for Claude

Result: Get comprehensive Claude skills for any framework, API, or tool in 20-40 minutes instead of hours of manual work.

Why Use This?

🎯 For Developers: Create skills from documentation + GitHub repos with conflict detection
🎮 For Game Devs: Generate skills for game engines (Godot docs + GitHub, Unity, etc.)
🔧 For Teams: Combine internal docs + code repositories into single source of truth
📚 For Learners: Build comprehensive skills from docs, code examples, and PDFs
🔍 For Open Source: Analyze repos to find documentation gaps and outdated examples

Key Features

🌐 Documentation Scraping

✅ llms.txt Support - Automatically detects and uses LLM-ready documentation files (10x faster)
✅ Universal Scraper - Works with ANY documentation website
✅ Smart Categorization - Automatically organizes content by topic
✅ Code Language Detection - Recognizes Python, JavaScript, C++, GDScript, etc.
✅ 8 Ready-to-Use Presets - Godot, React, Vue, Django, FastAPI, and more

📄 PDF Support (v1.2.0)

✅ Basic PDF Extraction - Extract text, code, and images from PDF files
✅ OCR for Scanned PDFs - Extract text from scanned documents
✅ Password-Protected PDFs - Handle encrypted PDFs
✅ Table Extraction - Extract complex tables from PDFs
✅ Parallel Processing - 3x faster for large PDFs
✅ Intelligent Caching - 50% faster on re-runs

🐙 GitHub Repository Scraping (v2.0.0)

✅ Deep Code Analysis - AST parsing for Python, JavaScript, TypeScript, Java, C++, Go
✅ API Extraction - Functions, classes, methods with parameters and types
✅ Repository Metadata - README, file tree, language breakdown, stars/forks
✅ GitHub Issues & PRs - Fetch open/closed issues with labels and milestones
✅ CHANGELOG & Releases - Automatically extract version history
✅ Conflict Detection - Compare documented APIs vs actual code implementation
✅ MCP Integration - Natural language: "Scrape GitHub repo facebook/react"

🔄 Unified Multi-Source Scraping (NEW - v2.0.0)

✅ Combine Multiple Sources - Mix documentation + GitHub + PDF in one skill
✅ Conflict Detection - Automatically finds discrepancies between docs and code
✅ Intelligent Merging - Rule-based or AI-powered conflict resolution
✅ Transparent Reporting - Side-by-side comparison with ⚠️ warnings
✅ Documentation Gap Analysis - Identifies outdated docs and undocumented features
✅ Single Source of Truth - One skill showing both intent (docs) and reality (code)
✅ Backward Compatible - Legacy single-source configs still work

🤖 Multi-LLM Platform Support (NEW - v2.5.0)

✅ 4 LLM Platforms - Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown
✅ Universal Scraping - Same documentation works for all platforms
✅ Platform-Specific Packaging - Optimized formats for each LLM
✅ One-Command Export - --target flag selects platform
✅ Optional Dependencies - Install only what you need
✅ 100% Backward Compatible - Existing Claude workflows unchanged

Platform	Format	Upload	Enhancement	API Key
Claude AI	ZIP + YAML	✅ Auto	✅ Yes	ANTHROPIC_API_KEY
Google Gemini	tar.gz	✅ Auto	✅ Yes	GOOGLE_API_KEY
OpenAI ChatGPT	ZIP + Vector Store	✅ Auto	✅ Yes	OPENAI_API_KEY
Generic Markdown	ZIP	❌ Manual	❌ No	None

# Claude (default - no changes needed!)
skill-seekers package output/react/
skill-seekers upload react.zip

# Google Gemini
pip install skill-seekers[gemini]
skill-seekers package output/react/ --target gemini
skill-seekers upload react-gemini.tar.gz --target gemini

# OpenAI ChatGPT
pip install skill-seekers[openai]
skill-seekers package output/react/ --target openai
skill-seekers upload react-openai.zip --target openai

# Generic Markdown (universal export)
skill-seekers package output/react/ --target markdown
# Use the markdown files directly in any LLM

Installation:

# Install with Gemini support
pip install skill-seekers[gemini]

# Install with OpenAI support
pip install skill-seekers[openai]

# Install with all LLM platforms
pip install skill-seekers[all-llms]

🔐 Private Config Repositories (NEW - v2.2.0)

✅ Git-Based Config Sources - Fetch configs from private/team git repositories
✅ Multi-Source Management - Register unlimited GitHub, GitLab, Bitbucket repos
✅ Team Collaboration - Share custom configs across 3-5 person teams
✅ Enterprise Support - Scale to 500+ developers with priority-based resolution
✅ Secure Authentication - Environment variable tokens (GITHUB_TOKEN, GITLAB_TOKEN)
✅ Intelligent Caching - Clone once, pull updates automatically
✅ Offline Mode - Work with cached configs when offline
✅ Backward Compatible - Existing API-based configs still work

🤖 AI & Enhancement

✅ AI-Powered Enhancement - Transforms basic templates into comprehensive guides
✅ No API Costs - FREE local enhancement using Claude Code Max
✅ MCP Server for Claude Code - Use directly from Claude Code with natural language

⚡ Performance & Scale

✅ Async Mode - 2-3x faster scraping with async/await (use --async flag)
✅ Large Documentation Support - Handle 10K-40K+ page docs with intelligent splitting
✅ Router/Hub Skills - Intelligent routing to specialized sub-skills
✅ Parallel Scraping - Process multiple skills simultaneously
✅ Checkpoint/Resume - Never lose progress on long scrapes
✅ Caching System - Scrape once, rebuild instantly

✅ Quality Assurance

✅ Fully Tested - 391 tests with comprehensive coverage

📦 Now Available on PyPI!

Skill Seekers is now published on the Python Package Index! Install with a single command:

pip install skill-seekers

Get started in seconds. No cloning, no setup - just install and run. See installation options below.

Quick Start

Option 1: Install from PyPI (Recommended)

# Install from PyPI (easiest method!)
pip install skill-seekers

# Use the unified CLI
skill-seekers scrape --config configs/react.json
skill-seekers github --repo facebook/react
skill-seekers enhance output/react/
skill-seekers package output/react/

Time: ~25 minutes | Quality: Production-ready | Cost: Free

📖 New to Skill Seekers? Check out our Quick Start Guide or Bulletproof Guide

Option 2: Install via uv (Modern Python Tool)

# Install with uv (fast, modern alternative)
uv tool install skill-seekers

# Or run directly without installing
uv tool run --from skill-seekers skill-seekers scrape --config https://raw.githubusercontent.com/yusufkaraaslan/Skill_Seekers/main/configs/react.json

# Unified CLI - simple commands
skill-seekers scrape --config configs/react.json
skill-seekers github --repo facebook/react
skill-seekers package output/react/

Time: ~25 minutes | Quality: Production-ready | Cost: Free

Option 3: Development Install (From Source)

# Clone and install in editable mode
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
pip install -e .

# Use the unified CLI
skill-seekers scrape --config configs/react.json

Option 4: Use from Claude Code & 4 Other AI Agents (MCP Integration)

# One-time setup (5 minutes) - Auto-configures 5 AI agents!
./setup_mcp.sh

# Then in Claude Code, Cursor, Windsurf, VS Code + Cline, or IntelliJ IDEA, just ask:
"Generate a React skill from https://react.dev/"
"Scrape PDF at docs/manual.pdf and create skill"

Time: Automated | Quality: Production-ready | Cost: Free

NEW in v2.4.0: MCP server now supports 5 AI coding agents with automatic configuration!

Option 5: Legacy CLI (Backwards Compatible)

# Install dependencies
pip3 install requests beautifulsoup4

# Run scripts directly (old method)
python3 src/skill_seekers/cli/doc_scraper.py --config configs/react.json

# Upload output/react.zip to Claude - Done!

Time: ~25 minutes | Quality: Production-ready | Cost: Free

🚀 NEW! One-Command Install Workflow (v2.1.1)

The fastest way to go from config to uploaded skill - complete automation:

# Install React skill from official configs (auto-uploads to Claude)
skill-seekers install --config react

# Install from local config file
skill-seekers install --config configs/custom.json

# Install without uploading (package only)
skill-seekers install --config django --no-upload

# Unlimited scraping (no page limits)
skill-seekers install --config godot --unlimited

# Preview workflow without executing
skill-seekers install --config react --dry-run

Time: 20-45 minutes total | Quality: Production-ready (9/10) | Cost: Free

What it does automatically:

✅ Fetches config from API (if config name provided)
✅ Scrapes documentation (respects rate limits, handles pagination)
✅ AI Enhancement (MANDATORY) - 30-60 sec, quality boost from 3/10 → 9/10
✅ Packages skill to .zip file
✅ Uploads to Claude (if ANTHROPIC_API_KEY set)

Why use this?

Zero friction - One command instead of 5 separate steps
Quality guaranteed - Enhancement is mandatory, ensures professional output
Complete automation - From config name to uploaded skill in Claude
Time savings - Fully automated end-to-end workflow

Phases executed:

📥 PHASE 1: Fetch Config (if config name provided)
📖 PHASE 2: Scrape Documentation
✨ PHASE 3: AI Enhancement (MANDATORY - no skip option)
📦 PHASE 4: Package Skill
☁️  PHASE 5: Upload to Claude (optional, requires API key)

Requirements:

ANTHROPIC_API_KEY environment variable (for auto-upload)
Claude Code Max plan (for local AI enhancement)

Example:

# Set API key once
export ANTHROPIC_API_KEY=sk-ant-your-key-here

# Run one command - sit back and relax!
skill-seekers install --config react

# Result: React skill uploaded to Claude in 20-45 minutes

📊 Feature Matrix

Skill Seekers supports 4 platforms and 5 skill modes with full feature parity.

Platforms: Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown Skill Modes: Documentation, GitHub, PDF, Unified Multi-Source, Local Repository

See Complete Feature Matrix for detailed platform and feature support.

Quick Platform Comparison

Feature	Claude	Gemini	OpenAI	Markdown
Format	ZIP + YAML	tar.gz	ZIP + Vector	ZIP
Upload	✅ API	✅ API	✅ API	❌ Manual
Enhancement	✅ Sonnet 4	✅ 2.0 Flash	✅ GPT-4o	❌ None
All Skill Modes	✅	✅	✅	✅

Examples:

# Package for all platforms (same skill)
skill-seekers package output/react/ --target claude
skill-seekers package output/react/ --target gemini
skill-seekers package output/react/ --target openai
skill-seekers package output/react/ --target markdown

# Install for specific platform
skill-seekers install --config django --target gemini
skill-seekers install --config fastapi --target openai

Usage Examples

Documentation Scraping

# Scrape documentation website
skill-seekers scrape --config configs/react.json

# Quick scrape without config
skill-seekers scrape --url https://react.dev --name react

# With async mode (3x faster)
skill-seekers scrape --config configs/godot.json --async --workers 8

PDF Extraction

# Basic PDF extraction
skill-seekers pdf --pdf docs/manual.pdf --name myskill

# Advanced features
skill-seekers pdf --pdf docs/manual.pdf --name myskill \
    --extract-tables \        # Extract tables
    --parallel \              # Fast parallel processing
    --workers 8               # Use 8 CPU cores

# Scanned PDFs (requires: pip install pytesseract Pillow)
skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr

# Password-protected PDFs
skill-seekers pdf --pdf docs/encrypted.pdf --name myskill --password mypassword

Time: ~5-15 minutes (or 2-5 minutes with parallel) | Quality: Production-ready | Cost: Free

GitHub Repository Scraping

# Basic repository scraping
skill-seekers github --repo facebook/react

# Using a config file
skill-seekers github --config configs/react_github.json

# With authentication (higher rate limits)
export GITHUB_TOKEN=ghp_your_token_here
skill-seekers github --repo facebook/react

# Customize what to include
skill-seekers github --repo django/django \
    --include-issues \        # Extract GitHub Issues
    --max-issues 100 \        # Limit issue count
    --include-changelog \     # Extract CHANGELOG.md
    --include-releases        # Extract GitHub Releases

Time: ~5-10 minutes | Quality: Production-ready | Cost: Free

Unified Multi-Source Scraping (NEW - v2.0.0)

The Problem: Documentation and code often drift apart. Docs might be outdated, missing features that exist in code, or documenting features that were removed.

The Solution: Combine documentation + GitHub + PDF into one unified skill that shows BOTH what's documented AND what actually exists, with clear warnings about discrepancies.

# Use existing unified configs
skill-seekers unified --config configs/react_unified.json
skill-seekers unified --config configs/django_unified.json

# Or create unified config (mix documentation + GitHub)
cat > configs/myframework_unified.json << 'EOF'
{
  "name": "myframework",
  "description": "Complete framework knowledge from docs + code",
  "merge_mode": "rule-based",
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://docs.myframework.com/",
      "extract_api": true,
      "max_pages": 200
    },
    {
      "type": "github",
      "repo": "owner/myframework",
      "include_code": true,
      "code_analysis_depth": "surface"
    }
  ]
}
EOF

# Run unified scraper
skill-seekers unified --config configs/myframework_unified.json

# Package and upload
skill-seekers package output/myframework/
# Upload output/myframework.zip to Claude - Done!

Time: ~30-45 minutes | Quality: Production-ready with conflict detection | Cost: Free

What Makes It Special:

✅ Conflict Detection - Automatically finds 4 types of discrepancies:

🔴 Missing in code (high): Documented but not implemented
🟡 Missing in docs (medium): Implemented but not documented
⚠️ Signature mismatch: Different parameters/types
ℹ️ Description mismatch: Different explanations

✅ Transparent Reporting - Shows both versions side-by-side:

#### `move_local_x(delta: float)`

⚠️ **Conflict**: Documentation signature differs from implementation

**Documentation says:**

def move_local_x(delta: float)


**Code implementation:**
```python
def move_local_x(delta: float, snap: bool = False) -> None


✅ **Advantages:**
- **Identifies documentation gaps** - Find outdated or missing docs automatically
- **Catches code changes** - Know when APIs change without docs being updated
- **Single source of truth** - One skill showing intent (docs) AND reality (code)
- **Actionable insights** - Get suggestions for fixing each conflict
- **Development aid** - See what's actually in the codebase vs what's documented

**Example Unified Configs:**
- `configs/react_unified.json` - React docs + GitHub repo
- `configs/django_unified.json` - Django docs + GitHub repo
- `configs/fastapi_unified.json` - FastAPI docs + GitHub repo

**Full Guide:** See [docs/UNIFIED_SCRAPING.md](https://github.com/yusufkaraaslan/Skill_Seekers/blob/development/docs/UNIFIED_SCRAPING.md) for complete documentation.

### Private Config Repositories (**NEW - v2.2.0**)

**The Problem:** Teams need to share custom configs for internal documentation, but don't want to publish them publicly.

**The Solution:** Register private git repositories as config sources. Fetch configs from team repos just like the public API, with full authentication support.

```bash
# Setup: Set your GitHub token (one-time)
export GITHUB_TOKEN=ghp_your_token_here

# Option 1: Using MCP tools (recommended)
# Register your team's private repo
add_config_source(
    name="team",
    git_url="https://github.com/mycompany/skill-configs.git",
    token_env="GITHUB_TOKEN"
)

# Fetch config from team repo
fetch_config(source="team", config_name="internal-api")

# List all registered sources
list_config_sources()

# Remove source when no longer needed
remove_config_source(name="team")

Direct Git URL mode (no registration):

# Fetch directly from git URL
fetch_config(
    git_url="https://github.com/mycompany/configs.git",
    config_name="react-custom",
    token="ghp_your_token_here"
)

Supported Platforms:

GitHub (token env: GITHUB_TOKEN)
GitLab (token env: GITLAB_TOKEN)
Gitea (token env: GITEA_TOKEN)
Bitbucket (token env: BITBUCKET_TOKEN)
Any git server (token env: GIT_TOKEN)

Use Cases:

📋 Small Teams (3-5 people)

# Team lead creates repo
gh repo create myteam/skill-configs --private

# Add configs to repo
cd myteam-skill-configs
cp ../Skill_Seekers/configs/react.json ./react-custom.json
# Edit selectors, categories for your internal docs...
git add . && git commit -m "Add custom React config" && git push

# Team members register (one-time)
add_config_source(name="team", git_url="https://github.com/myteam/skill-configs.git")

# Everyone can now fetch
fetch_config(source="team", config_name="react-custom")

🏢 Enterprise (500+ developers)

# IT pre-configures sources for everyone
add_config_source(name="platform", git_url="gitlab.company.com/platform/configs", priority=1)
add_config_source(name="mobile", git_url="gitlab.company.com/mobile/configs", priority=2)
add_config_source(name="official", git_url="api.skillseekersweb.com", priority=3)

# Developers use transparently
fetch_config(config_name="internal-platform")  # Finds in platform source
fetch_config(config_name="react")              # Falls back to official API

Storage Locations:

Registry: ~/.skill-seekers/sources.json
Cache: $SKILL_SEEKERS_CACHE_DIR (default: ~/.skill-seekers/cache/)

Features:

✅ Shallow clone - 10-50x faster, minimal disk space
✅ Auto-pull - Fetches latest changes automatically
✅ Offline mode - Works with cached repos when offline
✅ Priority resolution - Multiple sources with conflict resolution
✅ Secure - Tokens via environment variables only

Example Team Repository:

Try the included example:

# Test with file:// URL (no auth needed)
cd /path/to/Skill_Seekers

# Run the E2E test
python3 configs/example-team/test_e2e.py

# Or test manually
add_config_source(
    name="example",
    git_url="file://$(pwd)/configs/example-team",
    branch="master"
)

fetch_config(source="example", config_name="react-custom")

Full Guide: See docs/GIT_CONFIG_SOURCES.md for complete documentation.

How It Works

graph LR
    A[Documentation Website] --> B[Skill Seeker]
    B --> C[Scraper]
    B --> D[AI Enhancement]
    B --> E[Packager]
    C --> F[Organized References]
    D --> F
    F --> E
    E --> G[Claude Skill .zip]
    G --> H[Upload to Claude AI]

Detect llms.txt - Checks for llms-full.txt, llms.txt, llms-small.txt first
Scrape: Extracts all pages from documentation
Categorize: Organizes content into topics (API, guides, tutorials, etc.)
Enhance: AI analyzes docs and creates comprehensive SKILL.md with examples
Package: Bundles everything into a Claude-ready .zip file

📋 Prerequisites

Before you start, make sure you have:

Python 3.10 or higher - Download | Check: python3 --version
Git - Download | Check: git --version
15-30 minutes for first-time setup

First time user? → Start Here: Bulletproof Quick Start Guide 🎯

This guide walks you through EVERYTHING step-by-step (Python install, git clone, first skill creation).

🚀 Quick Start

Method 1: MCP Server for 5 AI Agents (Easiest - NEW v2.4.0!)

Use Skill Seeker directly from Claude Code, Cursor, Windsurf, VS Code + Cline, or IntelliJ IDEA with natural language!

# Clone repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers

# One-time setup (5 minutes) - Auto-configures ALL 5 agents!
./setup_mcp.sh

# Restart your AI agent, then just ask:

In Claude Code, Cursor, Windsurf, VS Code + Cline, or IntelliJ IDEA:

List all available configs
Generate config for Tailwind at https://tailwindcss.com/docs
Scrape docs using configs/react.json
Package skill at output/react/

Benefits:

✅ No manual CLI commands
✅ Natural language interface
✅ Integrated with your workflow
✅ 17 tools available instantly (up from 9!)
✅ 5 AI agents supported - auto-configured with one command
✅ Tested and working in production

NEW in v2.4.0:

✅ Upgraded to MCP SDK v1.25.0 - Latest features and performance
✅ FastMCP Framework - Modern, maintainable MCP implementation
✅ HTTP + stdio transport - Works with more AI agents
✅ 17 tools (up from 9) - More capabilities
✅ Multi-agent auto-configuration - Setup all agents with one command

Full guides:

📘 MCP Setup Guide - Complete installation instructions
🧪 MCP Testing Guide - Test all 17 tools
📦 Large Documentation Guide - Handle 10K-40K+ pages
📤 Upload Guide - How to upload skills to Claude

Method 2: CLI (Traditional)

One-Time Setup: Create Virtual Environment

# Clone repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate  # macOS/Linux
# OR on Windows: venv\Scripts\activate

# Install dependencies
pip install requests beautifulsoup4 pytest

# Save dependencies
pip freeze > requirements.txt

# Optional: Install anthropic for API-based enhancement (not needed for LOCAL enhancement)
# pip install anthropic

Always activate the virtual environment before using Skill Seeker:

source venv/bin/activate  # Run this each time you start a new terminal session

Easiest: Use a Preset

# Make sure venv is activated (you should see (venv) in your prompt)
source venv/bin/activate

# Optional: Estimate pages first (fast, 1-2 minutes)
skill-seekers estimate configs/godot.json

# Use Godot preset
skill-seekers scrape --config configs/godot.json

# Use React preset
skill-seekers scrape --config configs/react.json

# See all presets
ls configs/

Interactive Mode

skill-seekers scrape --interactive

Quick Mode

skill-seekers scrape \
  --name react \
  --url https://react.dev/ \
  --description "React framework for UIs"

📤 Uploading Skills to Claude

Once your skill is packaged, you need to upload it to Claude:

Option 1: Automatic Upload (API-based)

# Set your API key (one-time)
export ANTHROPIC_API_KEY=sk-ant-...

# Package and upload automatically
skill-seekers package output/react/ --upload

# OR upload existing .zip
skill-seekers upload output/react.zip

Benefits:

✅ Fully automatic
✅ No manual steps
✅ Works from command line

Requirements:

Anthropic API key (get from https://console.anthropic.com/)

Option 2: Manual Upload (No API Key)

# Package skill
skill-seekers package output/react/

# This will:
# 1. Create output/react.zip
# 2. Open the output/ folder automatically
# 3. Show upload instructions

# Then manually upload:
# - Go to https://claude.ai/skills
# - Click "Upload Skill"
# - Select output/react.zip
# - Done!

Benefits:

✅ No API key needed
✅ Works for everyone
✅ Folder opens automatically

Option 3: Claude Code (MCP) - Smart & Automatic

In Claude Code, just ask:
"Package and upload the React skill"

# With API key set:
# - Packages the skill
# - Uploads to Claude automatically
# - Done! ✅

# Without API key:
# - Packages the skill
# - Shows where to find the .zip
# - Provides manual upload instructions

Benefits:

✅ Natural language
✅ Smart auto-detection (uploads if API key available)
✅ Works with or without API key
✅ No errors or failures

🤖 Installing to AI Agents

Skill Seekers can automatically install skills to 10+ AI coding agents.

Quick Start

# Install to specific agent
skill-seekers install-agent output/react/ --agent cursor

# Install to all agents at once
skill-seekers install-agent output/react/ --agent all

# Overwrite existing installation
skill-seekers install-agent output/react/ --agent claude --force

# Preview without installing
skill-seekers install-agent output/react/ --agent cursor --dry-run

Supported Agents

Agent	Path	Type
Claude Code	`~/.claude/skills/`	Global
Cursor	`.cursor/skills/`	Project
VS Code / Copilot	`.github/skills/`	Project
Amp	`~/.amp/skills/`	Global
Goose	`~/.config/goose/skills/`	Global
OpenCode	`~/.opencode/skills/`	Global
Letta	`~/.letta/skills/`	Global
Aide	`~/.aide/skills/`	Global
Windsurf	`~/.windsurf/skills/`	Global
Neovate Code	`~/.neovate/skills/`	Global

Global paths install to user's home directory (~/). Project paths install to current project's root directory.

Complete Workflow

# 1. Scrape documentation
skill-seekers scrape --config configs/react.json --enhance-local

# 2. Package skill
skill-seekers package output/react/

# 3. Install to your agent
skill-seekers install-agent output/react/ --agent cursor

# 4. Restart Cursor to load the skill

🤖 Multi-Agent MCP Support (NEW in v2.4.0)

Skill Seekers MCP server now works with 5 leading AI coding agents!

Supported AI Agents

Agent	Transport	Setup Difficulty	Auto-Configured
Claude Code	stdio	Easy	✅ Yes
VS Code + Cline	stdio	Easy	✅ Yes
Cursor	HTTP	Medium	✅ Yes
Windsurf	HTTP	Medium	✅ Yes
IntelliJ IDEA	HTTP	Medium	✅ Yes

Quick Setup - All Agents at Once

# Clone repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers

# Run one command - auto-configures ALL 5 agents!
./setup_mcp.sh

# Restart your AI agent and start using natural language:
"List all available configs"
"Generate a React skill from https://react.dev/"
"Package the skill at output/react/"

What setup_mcp.sh does:

✅ Installs MCP server dependencies
✅ Configures Claude Code (stdio transport)
✅ Configures VS Code + Cline (stdio transport)
✅ Configures Cursor (HTTP transport)
✅ Configures Windsurf (HTTP transport)
✅ Configures IntelliJ IDEA (HTTP transport)
✅ Shows next steps for each agent

Time: 5 minutes | Result: All agents configured and ready to use

Transport Modes

Skill Seekers MCP server supports 2 transport modes:

stdio Transport (Claude Code, VS Code + Cline)

How it works: Agent launches MCP server as subprocess and communicates via stdin/stdout

Benefits:

✅ More secure (no network ports)
✅ Automatic lifecycle management
✅ Simpler configuration
✅ Better for single-user development

Configuration example (Claude Code):

{
  "mcpServers": {
    "skill-seeker": {
      "command": "python3",
      "args": ["-m", "skill_seekers.mcp.server"],
      "cwd": "/path/to/Skill_Seekers"
    }
  }
}

HTTP Transport (Cursor, Windsurf, IntelliJ IDEA)

How it works: MCP server runs as HTTP service, agents connect as clients

Benefits:

✅ Multi-agent support (one server, multiple clients)
✅ Server can run independently
✅ Better for team collaboration
✅ Easier debugging and monitoring

Configuration example (Cursor):

{
  "mcpServers": {
    "skill-seeker": {
      "url": "http://localhost:8765/sse"
    }
  }
}

Starting HTTP server:

# Start server manually (runs in background)
cd /path/to/Skill_Seekers
python3 -m skill_seekers.mcp.server --transport http --port 8765

# Or use auto-start script
./scripts/start_mcp_server.sh

Agent-Specific Instructions

Claude Code (stdio)

# Already configured by setup_mcp.sh!
# Just restart Claude Code

# Config location: ~/.claude/claude_code_config.json

Usage:

In Claude Code:
"List all available configs"
"Scrape React docs at https://react.dev/"

VS Code + Cline Extension (stdio)

# Already configured by setup_mcp.sh!
# Just restart VS Code

# Config location: ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

Usage:

In Cline:
"Generate config for Tailwind"
"Package skill at output/tailwind/"

Cursor (HTTP)

# 1. Setup already configured HTTP settings
# Config location: ~/.cursor/mcp_settings.json

# 2. Start HTTP server (one-time per session)
./scripts/start_mcp_server.sh

# 3. Restart Cursor

Usage:

In Cursor:
"Show me all skill-seeker configs"
"Create Django skill from docs"

Windsurf (HTTP)

# 1. Setup already configured HTTP settings
# Config location: ~/.windsurf/mcp_settings.json

# 2. Start HTTP server (one-time per session)
./scripts/start_mcp_server.sh

# 3. Restart Windsurf

Usage:

In Windsurf:
"Estimate pages for Godot config"
"Build unified skill for FastAPI"

IntelliJ IDEA (HTTP)

# 1. Setup already configured HTTP settings
# Config location: ~/.intellij/mcp_settings.json

# 2. Start HTTP server (one-time per session)
./scripts/start_mcp_server.sh

# 3. Restart IntelliJ IDEA

Usage:

In IntelliJ IDEA:
"Validate my config file"
"Split large Godot config"

Available MCP Tools (17 Total)

All agents have access to these 17 tools:

Core Tools (9):

list_configs - List all available preset configurations
generate_config - Generate new config for any docs site
validate_config - Validate config structure
estimate_pages - Estimate page count before scraping
scrape_docs - Scrape and build skill
package_skill - Package skill into .zip
upload_skill - Upload .zip to Claude
split_config - Split large documentation configs
generate_router - Generate router/hub skills

Extended Tools (8 - NEW!): 10. scrape_github - Scrape GitHub repositories 11. scrape_pdf - Extract content from PDFs 12. unified_scrape - Combine multiple sources 13. merge_sources - Merge documentation + code 14. detect_conflicts - Find doc/code discrepancies 15. add_config_source - Register private git repos 16. fetch_config - Fetch configs from git 17. list_config_sources - List registered sources

What's New in v2.4.0

MCP Infrastructure:

✅ Upgraded to MCP SDK v1.25.0 - Latest stable version
✅ FastMCP Framework - Modern, maintainable implementation
✅ Dual Transport - stdio + HTTP support
✅ 17 Tools - Up from 9 (almost 2x!)
✅ Auto-Configuration - One script configures all agents

Agent Support:

✅ 5 Agents Supported - Claude Code, VS Code + Cline, Cursor, Windsurf, IntelliJ IDEA
✅ Automatic Setup - ./setup_mcp.sh configures everything
✅ Transport Detection - Auto-selects stdio vs HTTP per agent
✅ Config Management - Handles all agent-specific config formats

Developer Experience:

✅ One Setup Command - Works for all agents
✅ Natural Language - Use plain English in any agent
✅ No CLI Required - All features via MCP tools
✅ Full Testing - All 17 tools tested and working

Troubleshooting Multi-Agent Setup

HTTP server not starting?

# Check if port 8765 is in use
lsof -i :8765

# Use different port
python3 -m skill_seekers.mcp.server --transport http --port 9000

# Update agent config with new port

Agent not finding MCP server?

# Verify config file exists
cat ~/.claude/claude_code_config.json
cat ~/.cursor/mcp_settings.json

# Re-run setup
./setup_mcp.sh

# Check server logs
tail -f logs/mcp_server.log

Tools not appearing in agent?

# Restart agent completely (quit and relaunch)
# For HTTP transport, ensure server is running:
ps aux | grep "skill_seekers.mcp.server"

# Test server directly
curl http://localhost:8765/health

Complete Multi-Agent Workflow

# 1. One-time setup (5 minutes)
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
./setup_mcp.sh

# 2. For HTTP agents (Cursor/Windsurf/IntelliJ), start server
./scripts/start_mcp_server.sh

# 3. Restart your AI agent

# 4. Use natural language in ANY agent:
"List all available configs"
"Generate React skill from https://react.dev/"
"Estimate pages for Godot config"
"Package and upload skill at output/react/"

# 5. Result: Skills created without touching CLI!

Full Guide: See docs/MCP_SETUP.md for detailed multi-agent setup instructions.

📁 Simple Structure

doc-to-skill/
├── cli/
│   ├── doc_scraper.py      # Main scraping tool
│   ├── package_skill.py    # Package to .zip
│   ├── upload_skill.py     # Auto-upload (API)
│   └── enhance_skill.py    # AI enhancement
├── mcp/                    # MCP server for 5 AI agents
│   └── server.py           # 17 MCP tools (v2.4.0)
├── configs/                # Preset configurations
│   ├── godot.json         # Godot Engine
│   ├── react.json         # React
│   ├── vue.json           # Vue.js
│   ├── django.json        # Django
│   └── fastapi.json       # FastAPI
└── output/                 # All output (auto-created)
    ├── godot_data/        # Scraped data
    ├── godot/             # Built skill
    └── godot.zip          # Packaged skill

✨ Features

1. Fast Page Estimation (NEW!)

skill-seekers estimate configs/react.json

# Output:
📊 ESTIMATION RESULTS
✅ Pages Discovered: 180
📈 Estimated Total: 230
⏱️  Time Elapsed: 1.2 minutes
💡 Recommended max_pages: 280

Benefits:

Know page count BEFORE scraping (saves time)
Validates URL patterns work correctly
Estimates total scraping time
Recommends optimal max_pages setting
Fast (1-2 minutes vs 20-40 minutes full scrape)

2. Auto-Detect Existing Data

skill-seekers scrape --config configs/godot.json

# If data exists:
✓ Found existing data: 245 pages
Use existing data? (y/n): y
⏭️  Skipping scrape, using existing data

3. Knowledge Generation

Automatic pattern extraction:

Extracts common code patterns from docs
Detects programming language
Creates quick reference with real examples
Smarter categorization with scoring

Enhanced SKILL.md:

Real code examples from documentation
Language-annotated code blocks
Common patterns section
Quick reference from actual usage examples

4. Smart Categorization

Automatically infers categories from:

URL structure
Page titles
Content keywords
With scoring for better accuracy

5. Code Language Detection

# Automatically detects:
- Python (def, import, from)
- JavaScript (const, let, =>)
- GDScript (func, var, extends)
- C++ (#include, int main)
- And more...

5. Skip Scraping

# Scrape once
skill-seekers scrape --config configs/react.json

# Later, just rebuild (instant)
skill-seekers scrape --config configs/react.json --skip-scrape

6. Async Mode for Faster Scraping (2-3x Speed!)

# Enable async mode with 8 workers (recommended for large docs)
skill-seekers scrape --config configs/react.json --async --workers 8

# Small docs (~100-500 pages)
skill-seekers scrape --config configs/mydocs.json --async --workers 4

# Large docs (2000+ pages) with no rate limiting
skill-seekers scrape --config configs/largedocs.json --async --workers 8 --no-rate-limit

Performance Comparison:

Sync mode (threads): ~18 pages/sec, 120 MB memory
Async mode: ~55 pages/sec, 40 MB memory
Result: 3x faster, 66% less memory!

When to use:

✅ Large documentation (500+ pages)
✅ Network latency is high
✅ Memory is constrained
❌ Small docs (< 100 pages) - overhead not worth it

See full guide: ASYNC_SUPPORT.md

7. AI-Powered SKILL.md Enhancement

# Option 1: During scraping (API-based, requires API key)
pip3 install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
skill-seekers scrape --config configs/react.json --enhance

# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
skill-seekers scrape --config configs/react.json --enhance-local

# Option 3: After scraping (API-based, standalone)
skill-seekers enhance output/react/

# Option 4: After scraping (LOCAL, no API key, standalone)
skill-seekers enhance output/react/

What it does:

Reads your reference documentation
Uses Claude to generate an excellent SKILL.md
Extracts best code examples (5-10 practical examples)
Creates comprehensive quick reference
Adds domain-specific key concepts
Provides navigation guidance for different skill levels
Automatically backs up original
Quality: Transforms 75-line templates into 500+ line comprehensive guides

LOCAL Enhancement (Recommended):

Uses your Claude Code Max plan (no API costs)
Opens new terminal with Claude Code
Analyzes reference files automatically
Takes 30-60 seconds
Quality: 9/10 (comparable to API version)

7. Large Documentation Support (10K-40K+ Pages)

For massive documentation sites like Godot (40K pages), AWS, or Microsoft Docs:

# 1. Estimate first (discover page count)
skill-seekers estimate configs/godot.json

# 2. Auto-split into focused sub-skills
python3 -m skill_seekers.cli.split_config configs/godot.json --strategy router

# Creates:
# - godot-scripting.json (5K pages)
# - godot-2d.json (8K pages)
# - godot-3d.json (10K pages)
# - godot-physics.json (6K pages)
# - godot-shaders.json (11K pages)

# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
for config in configs/godot-*.json; do
  skill-seekers scrape --config $config &
done
wait

# 4. Generate intelligent router/hub skill
python3 -m skill_seekers.cli.generate_router configs/godot-*.json

# 5. Package all skills
python3 -m skill_seekers.cli.package_multi output/godot*/

# 6. Upload all .zip files to Claude
# Users just ask questions naturally!
# Router automatically directs to the right sub-skill!

Split Strategies:

auto - Intelligently detects best strategy based on page count
category - Split by documentation categories (scripting, 2d, 3d, etc.)
router - Create hub skill + specialized sub-skills (RECOMMENDED)
size - Split every N pages (for docs without clear categories)

Benefits:

✅ Faster scraping (parallel execution)
✅ More focused skills (better Claude performance)
✅ Easier maintenance (update one topic at a time)
✅ Natural user experience (router handles routing)
✅ Avoids context window limits

Configuration:

{
  "name": "godot",
  "max_pages": 40000,
  "split_strategy": "router",
  "split_config": {
    "target_pages_per_skill": 5000,
    "create_router": true,
    "split_by_categories": ["scripting", "2d", "3d", "physics"]
  }
}

Full Guide: Large Documentation Guide

8. Checkpoint/Resume for Long Scrapes

Never lose progress on long-running scrapes:

# Enable in config
{
  "checkpoint": {
    "enabled": true,
    "interval": 1000  // Save every 1000 pages
  }
}

# If scrape is interrupted (Ctrl+C or crash)
skill-seekers scrape --config configs/godot.json --resume

# Resume from last checkpoint
✅ Resuming from checkpoint (12,450 pages scraped)
⏭️  Skipping 12,450 already-scraped pages
🔄 Continuing from where we left off...

# Start fresh (clear checkpoint)
skill-seekers scrape --config configs/godot.json --fresh

Benefits:

✅ Auto-saves every 1000 pages (configurable)
✅ Saves on interruption (Ctrl+C)
✅ Resume with --resume flag
✅ Never lose hours of scraping progress

🎯 Complete Workflows

First Time (With Scraping + Enhancement)

# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
skill-seekers scrape --config configs/godot.json --enhance-local

# 2. Wait for new terminal to close (enhancement completes)
# Check the enhanced SKILL.md:
cat output/godot/SKILL.md

# 3. Package
skill-seekers package output/godot/

# 4. Done! You have godot.zip with excellent SKILL.md

Time: 20-40 minutes (scraping) + 60 seconds (enhancement) = ~21-41 minutes

Using Existing Data (Fast!)

# 1. Use cached data + Local Enhancement
skill-seekers scrape --config configs/godot.json --skip-scrape
skill-seekers enhance output/godot/

# 2. Package
skill-seekers package output/godot/

# 3. Done!

Time: 1-3 minutes (build) + 60 seconds (enhancement) = ~2-4 minutes total

Without Enhancement (Basic)

# 1. Scrape + Build (no enhancement)
skill-seekers scrape --config configs/godot.json

# 2. Package
skill-seekers package output/godot/

# 3. Done! (SKILL.md will be basic template)

Time: 20-40 minutes Note: SKILL.md will be generic - enhancement strongly recommended!

📋 Available Presets

Config	Framework	Description
`godot.json`	Godot Engine	Game development
`react.json`	React	UI framework
`vue.json`	Vue.js	Progressive framework
`django.json`	Django	Python web framework
`fastapi.json`	FastAPI	Modern Python API
`ansible-core.json`	Ansible Core 2.19	Automation & configuration

Using Presets

# Godot
skill-seekers scrape --config configs/godot.json

# React
skill-seekers scrape --config configs/react.json

# Vue
skill-seekers scrape --config configs/vue.json

# Django
skill-seekers scrape --config configs/django.json

# FastAPI
skill-seekers scrape --config configs/fastapi.json

# Ansible
skill-seekers scrape --config configs/ansible-core.json

🎨 Creating Your Own Config

Option 1: Interactive

skill-seekers scrape --interactive
# Follow prompts, it will create the config for you

Option 2: Copy and Edit

# Copy a preset
cp configs/react.json configs/myframework.json

# Edit it
nano configs/myframework.json

# Use it
skill-seekers scrape --config configs/myframework.json

Config Structure

{
  "name": "myframework",
  "description": "When to use this skill",
  "base_url": "https://docs.myframework.com/",
  "selectors": {
    "main_content": "article",
    "title": "h1",
    "code_blocks": "pre code"
  },
  "url_patterns": {
    "include": ["/docs", "/guide"],
    "exclude": ["/blog", "/about"]
  },
  "categories": {
    "getting_started": ["intro", "quickstart"],
    "api": ["api", "reference"]
  },
  "rate_limit": 0.5,
  "max_pages": 500
}

📊 What Gets Created

output/
├── godot_data/              # Scraped raw data
│   ├── pages/              # JSON files (one per page)
│   └── summary.json        # Overview
│
└── godot/                   # The skill
    ├── SKILL.md            # Enhanced with real examples
    ├── references/         # Categorized docs
    │   ├── index.md
    │   ├── getting_started.md
    │   ├── scripting.md
    │   └── ...
    ├── scripts/            # Empty (add your own)
    └── assets/             # Empty (add your own)

🎯 Command Line Options

# Interactive mode
skill-seekers scrape --interactive

# Use config file
skill-seekers scrape --config configs/godot.json

# Quick mode
skill-seekers scrape --name react --url https://react.dev/

# Skip scraping (use existing data)
skill-seekers scrape --config configs/godot.json --skip-scrape

# With description
skill-seekers scrape \
  --name react \
  --url https://react.dev/ \
  --description "React framework for building UIs"

💡 Tips

1. Test Small First

Edit max_pages in config to test:

{
  "max_pages": 20  // Test with just 20 pages
}

2. Reuse Scraped Data

# Scrape once
skill-seekers scrape --config configs/react.json

# Rebuild multiple times (instant)
skill-seekers scrape --config configs/react.json --skip-scrape
skill-seekers scrape --config configs/react.json --skip-scrape

3. Finding Selectors

# Test in Python
from bs4 import BeautifulSoup
import requests

url = "https://docs.example.com/page"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

# Try different selectors
print(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))

4. Check Output Quality

# After building, check:
cat output/godot/SKILL.md  # Should have real examples
cat output/godot/references/index.md  # Categories

🐛 Troubleshooting

No Content Extracted?

Check your main_content selector
Try: article, main, div[role="main"]

Data Exists But Won't Use It?

# Force re-scrape
rm -rf output/myframework_data/
skill-seekers scrape --config configs/myframework.json

Categories Not Good?

Edit the config categories section with better keywords.

Want to Update Docs?

# Delete old data
rm -rf output/godot_data/

# Re-scrape
skill-seekers scrape --config configs/godot.json

📈 Performance

Task	Time	Notes
Scraping (sync)	15-45 min	First time only, thread-based
Scraping (async)	5-15 min	2-3x faster with --async flag
Building	1-3 min	Fast!
Re-building	<1 min	With --skip-scrape
Packaging	5-10 sec	Final zip

✅ Summary

One tool does everything:

✅ Scrapes documentation
✅ Auto-detects existing data
✅ Generates better knowledge
✅ Creates enhanced skills
✅ Works with presets or custom configs
✅ Supports skip-scraping for fast iteration

Simple structure:

doc_scraper.py - The tool
configs/ - Presets
output/ - Everything else

Better output:

Real code examples with language detection
Common patterns extracted from docs
Smart categorization
Enhanced SKILL.md with actual examples

📚 Documentation

Getting Started

BULLETPROOF_QUICKSTART.md - 🎯 START HERE if you're new!
QUICKSTART.md - Quick start for experienced users
TROUBLESHOOTING.md - Common issues and solutions

Guides

docs/LARGE_DOCUMENTATION.md - Handle 10K-40K+ page docs
ASYNC_SUPPORT.md - Async mode guide (2-3x faster scraping)
docs/ENHANCEMENT.md - AI enhancement guide
docs/TERMINAL_SELECTION.md - Configure terminal app for local enhancement
docs/UPLOAD_GUIDE.md - How to upload skills to Claude
docs/MCP_SETUP.md - MCP integration setup

Technical

docs/CLAUDE.md - Technical architecture
STRUCTURE.md - Repository structure

🎮 Ready?

# Try Godot
skill-seekers scrape --config configs/godot.json

# Try React
skill-seekers scrape --config configs/react.json

# Or go interactive
skill-seekers scrape --interactive

📝 License

MIT License - see LICENSE file for details

Happy skill building! 🚀