Paper Intelligence

A local MCP server for intelligent paper/PDF management. Convert PDFs to markdown, then search them with hybrid grep + semantic search. Designed for token efficiency: search first, read only what you need.

🚀 Quick Start

1. Install UV (one-time setup)

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Add to Your MCP Client

Claude Code CLI:

claude mcp add paper-intelligence -- uvx paper-intelligence@latest

VS Code:

code --add-mcp '{"name":"paper-intelligence","command":"uvx","args":["paper-intelligence@latest"]}'

That's it! uvx handles everything automatically. Using @latest ensures you always get the newest version.

🔌 MCP Client Integration

Claude Desktop

Add to your Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence@latest"]
    }
  }
}

Cursor

Go to Settings → MCP → Add new MCP Server
Select command type
Enter: uvx paper-intelligence@latest

Or add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence@latest"]
    }
  }
}

Windsurf / Other MCP Clients

Any MCP-compatible client can use paper-intelligence:

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence@latest"]
    }
  }
}

✨ Features

PDF to Markdown — High-accuracy conversion using Marker
Hybrid Search — Combined grep (exact/regex) + semantic RAG search
Token Efficient — Search papers instead of reading entire documents
GPU Acceleration — MPS (Apple Silicon) and CUDA support
Self-Contained — Each paper gets its own directory with all data
Header Context — Search results show document structure (e.g., "Methods > Data Collection")

📖 MCP Tools

`process_paper`

Full pipeline: Convert PDF → Index headers → Create embeddings

Parameters:

pdf_path (string): Path to PDF file
use_llm (boolean, optional): Enhanced accuracy mode (default: false)
chunk_size (integer, optional): Text chunk size for embedding (default: 512)
chunk_overlap (integer, optional): Overlap between chunks (default: 50)

Example:

Process the paper at ~/Downloads/attention-is-all-you-need.pdf

Output Structure:

attention-is-all-you-need/
├── attention-is-all-you-need.md   # Converted markdown
├── metadata.json                   # Processing version info
├── index.json                      # Header hierarchy
├── chroma/                         # Embeddings database
└── images/                         # Extracted figures

`search`

Unified search with grep and/or semantic RAG

Parameters:

query (string): Search query (text, regex, or semantic)
paper_dirs (array): List of paper directories to search
mode (string, optional): "grep", "rag", or "hybrid" (default: hybrid)
top_k (integer, optional): Number of results (default: 5)
regex (boolean, optional): Treat query as regex (default: false)

Example Queries:

# Semantic search across papers
Search for "attention mechanism implementation" in my processed papers

# Exact text search
Search for "transformer" using grep mode

# Regex search
Search for "BERT|GPT|T5" with regex enabled

Returns: Results with line numbers, surrounding context, and header location.

`convert_pdf`

Convert PDF to Markdown (without embeddings)

Parameters:

pdf_path (string): Path to PDF file
output_dir (string, optional): Custom output directory
use_llm (boolean, optional): Enhanced accuracy mode

Returns: markdown_path, images_dir, image_count

`get_paper_info`

Check processing status of a paper

Parameters:

paper_dir (string): Path to paper directory

Example Response:

{
  "has_markdown": true,
  "has_index": true,
  "has_embeddings": true,
  "has_images": true,
  "image_count": 12,
  "version": "0.2.0",
  "processed_at": "2025-01-15T10:30:00Z"
}

`index_markdown` / `embed_document`

Individual pipeline steps (for advanced use)

index_markdown — Extract header hierarchy into searchable JSON

index_markdown(markdown_path="~/Downloads/paper/paper.md")

embed_document — Create embeddings for semantic search

embed_document(
    markdown_path="~/Downloads/paper/paper.md",
    chunk_size=512,
    chunk_overlap=50
)

📊 Example Output

Search Result

{
  "source": "attention-is-all-you-need.md",
  "line_number": 142,
  "header_path": "Model Architecture > Attention",
  "content": "An attention function can be described as mapping a query and a set of key-value pairs to an output...",
  "score": 0.89
}

🎯 Typical Workflow

Process a paper:

Process the PDF at ~/Downloads/transformer-paper.pdf
Search across papers:

Search for "positional encoding" in my papers
Read specific sections:

Show me the Methods section from the transformer paper

The agent reads search results (a few hundred tokens) instead of entire papers (tens of thousands of tokens).

🛠️ Installation Options

Install from PyPI

# Install with pip
pip install paper-intelligence

# Or run directly with uvx (no install needed)
uvx paper-intelligence@latest

Install from GitHub

pip install "paper-intelligence @ git+https://github.com/Strand-AI/paper-intelligence.git"

Local Development

git clone https://github.com/Strand-AI/paper-intelligence.git
cd paper-intelligence

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install in development mode
pip install -e ".[dev]"

# Run the server
python -m paper_intelligence.server

Development MCP config:

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "python",
      "args": ["-m", "paper_intelligence.server"],
      "cwd": "/path/to/paper-intelligence"
    }
  }
}

Run tests:

# Unit tests (fast)
pytest tests/test_markdown_parser.py

# Integration tests (slow, requires ML models)
pytest tests/test_integration.py -v

🔧 Debugging

Use the MCP Inspector to debug the server:

npx @modelcontextprotocol/inspector uvx paper-intelligence@latest

🆘 Troubleshooting

Server not starting?

Ensure Python 3.11+ is installed
Try uvx paper-intelligence@latest directly to see error messages
Check that all dependencies installed correctly

Windows encoding issues?

Add to your MCP config:

"env": {
  "PYTHONIOENCODING": "utf-8"
}

Claude Desktop not detecting changes?

Claude Desktop only reads configuration on startup. Fully restart the app after config changes.

🏗️ Technical Stack

Component	Technology
MCP Server	Official Python SDK with FastMCP
PDF Conversion	marker-pdf
Embeddings	LlamaIndex + HuggingFace (BAAI/bge-small-en-v1.5)
Vector Store	ChromaDB (persistent, local per-paper)
GPU Support	PyTorch with MPS (Apple) or CUDA

🙏 Acknowledgments

Marker for excellent PDF conversion
LlamaIndex for the RAG framework
ChromaDB for the vector database
FastMCP for the MCP server framework

📄 License

MIT — see LICENSE for details.

Paper Intelligence

Paper Intelligence

🚀 Quick Start

1. Install UV (one-time setup)

2. Add to Your MCP Client

🔌 MCP Client Integration

✨ Features

📖 MCP Tools

process_paper

search

convert_pdf

get_paper_info

index_markdown / embed_document

📊 Example Output

Search Result

🎯 Typical Workflow

🛠️ Installation Options

🔧 Debugging

🆘 Troubleshooting

🏗️ Technical Stack

🙏 Acknowledgments

📄 License

Reviews

`process_paper`

`search`

`convert_pdf`

`get_paper_info`

`index_markdown` / `embed_document`