📚 llm-wiki-kit

Stop re-explaining your research to your AI agent every session.

llm-wiki-kit gives your AI agent a persistent, structured memory that compounds over time. Drop PDFs, URLs, YouTube videos — your agent builds a wiki, connects the dots, and remembers everything across sessions.

Based on Karpathy's LLM Wiki pattern. Works with Claude, Codex, Cursor, Windsurf, and any MCP-compatible agent.

The Problem

Every time you start a new chat:

You: "Remember that paper on speculative decoding I shared last week?"
Agent: "I don't have access to previous conversations..."
You: *sighs, re-uploads PDF, re-explains context*

You're constantly re-teaching your agent things it should already know.

The Solution

With llm-wiki-kit, your agent maintains its own knowledge base:

You: "What did we learn about speculative decoding?"
Agent: *searches wiki* "Based on the 3 papers you've shared, the Eagle 
       architecture shows the best efficiency tradeoffs because..."

The wiki persists. Cross-references build up. Your agent gets smarter with every source you add.

⚡ Quickstart (2 minutes)

1. Install

pip install "llm-wiki-kit[all] @ git+https://github.com/iamsashank09/llm-wiki-kit.git"

2. Initialize a wiki

mkdir my-research && cd my-research
llm-wiki-kit init --agent claude

3. Connect your agent

Add to Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "llm-wiki-kit": {
      "command": "llm-wiki-kit",
      "args": ["serve", "--root", "/path/to/my-research"]
    }
  }
}

Other agents (Codex, Cursor, Windsurf)

OpenAI Codex

codex mcp add llm-wiki-kit -- llm-wiki-kit serve --root /path/to/my-research

Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "llm-wiki-kit": {
      "command": "llm-wiki-kit",
      "args": ["serve", "--root", "/path/to/my-research"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "llm-wiki-kit": {
      "command": "llm-wiki-kit",
      "args": ["serve", "--root", "/path/to/my-research"]
    }
  }
}

4. Use it

You: "Ingest this paper: raw/attention-is-all-you-need.pdf"
Agent: *creates wiki pages, cross-references concepts, updates index*

You: "Now ingest https://youtube.com/watch?v=kCc8FmEb1nY"
Agent: *extracts transcript, links to existing transformer concepts*

You: "How does the attention mechanism in the paper relate to Karpathy's explanation?"
Agent: *searches wiki, synthesizes answer from both sources*

Your agent now has persistent memory that survives across sessions.

🔥 What Makes This Different

Feature	Why It Matters
Multi-format ingest	PDFs, URLs, YouTube, markdown — just drop it in
Auto cross-referencing	Agent builds `[[wiki links]]` between related concepts
Persistent across sessions	Start fresh chats without losing context
Full-text search	Agent finds relevant pages instantly (SQLite FTS5)
Health checks	`wiki_lint` catches broken links, orphan pages, contradictions
Zero lock-in	It's just markdown files in a folder — view in Obsidian, VS Code, anywhere
Works with any MCP agent	Claude, Codex, Cursor, Windsurf, and more

📥 Supported Sources

Your agent can ingest anything:

Drop this...	Get this...
`raw/paper.pdf`	Extracted text, page markers, metadata
`https://arxiv.org/abs/...`	Clean article content, auto-saved to `raw/`
`https://youtube.com/watch?v=...`	Full transcript with timestamps
`raw/notes.md`	Direct markdown ingestion

Install what you need:

pip install "llm-wiki-kit[pdf]"      # PDF support
pip install "llm-wiki-kit[web]"      # URL extraction  
pip install "llm-wiki-kit[youtube]"  # YouTube transcripts
pip install "llm-wiki-kit[all]"      # Everything

🧠 How It Works

┌─────────────────────────────────────────────────────────┐
│  YOU                                                    │
│  "Ingest this paper. How does it relate to X?"         │
└───────────────────────┬─────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────┐
│  WIKI (agent-maintained)                                │
│                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ concepts/    │  │ sources/     │  │ synthesis/   │  │
│  │ attention.md │◄─┤ paper-1.md   │──► cache.md     │  │
│  │ [[linked]]   │  │ [[linked]]   │  │ [[linked]]   │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
│                                                         │
│  + index.md (table of contents)                        │
│  + log.md (what happened when)                         │
└───────────────────────┬─────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────┐
│  RAW SOURCES (immutable)                                │
│  paper.pdf, article.html, transcript.md                 │
└─────────────────────────────────────────────────────────┘

The agent reads raw sources, writes wiki pages, and maintains the connections. You never touch the wiki directly — the agent does all the work.

🛠 Available Tools

Your agent gets these MCP tools:

Tool	What it does
`wiki_ingest`	Process any source (file, URL, YouTube)
`wiki_write_page`	Create or update a wiki page
`wiki_read_page`	Read a specific page
`wiki_search`	Full-text search across all pages
`wiki_lint`	Find broken links, orphans, empty pages
`wiki_status`	Overview: page count, sources, recent activity
`wiki_log`	Append to the operation log

💡 Use Cases

Research: Feed papers into your wiki over weeks. Ask synthesis questions that span all your reading.

Technical onboarding: Ingest a codebase's docs. Your agent answers architecture questions from accumulated context.

Competitive intel: Add market reports, earnings calls, news. Agent maintains a living landscape that updates as you add more.

Learning: Watch YouTube tutorials, read blog posts. Agent builds a personalized wiki of everything you've studied.

Book notes: Ingest chapters as you read. Agent tracks characters, themes, plot threads, and connections.

🔍 Pro Tips

Use Obsidian to visualize your wiki's graph — it's just a folder of markdown files
Git init your wiki directory — get version history for free
Let the agent link aggressively — the value compounds in the connections
Run lint periodically — catches contradictions and gaps in your knowledge base
Start small — even 5-10 sources produce a surprisingly useful wiki

📦 Development

git clone https://github.com/iamsashank09/llm-wiki-kit
cd llm-wiki-kit
uv venv && source .venv/bin/activate
uv pip install -e ".[all]"

🙏 Credits

Based on the LLM Wiki idea by Andrej Karpathy.

📄 License

MIT — do whatever you want with it.

LLM Wiki Kit