scholar-mcp
A FastMCP server providing structured academic literature access via Semantic Scholar, with OpenAlex enrichment and optional docling-serve PDF conversion.
Features
- Search & retrieval -- full-text paper search with year, venue, field-of-study, and citation-count filters; single-paper lookup by DOI, S2 ID, arXiv ID, and more; author profile and name search
- Citation graph -- forward citations, backward references, BFS graph traversal up to configurable depth, and shortest-path bridge paper discovery
- Recommendations -- paper recommendations from positive (and optional negative) examples via the S2 recommendation API
- OpenAlex enrichment -- augment paper metadata with open-access URLs, affiliations, funders, concepts, and OA status
- PDF conversion -- download open-access PDFs and convert to Markdown via docling-serve, with optional VLM enrichment for formulas and figures
- Intelligent caching -- SQLite-backed cache with per-table TTLs (30 days for papers/authors, 7 days for citations/references) and identifier aliasing
- Authentication -- bearer token, OIDC (OAuth 2.1), or both simultaneously (multi-auth)
- Multi-transport -- stdio (Claude Desktop), HTTP (streamable-http), and SSE transports
- Linux packages --
.deband.rpmpackages with systemd service and security hardening
Installation
With uvx (recommended)
uvx --from pvliesdonk-scholar-mcp scholar-mcp serve
With pip
pip install 'pvliesdonk-scholar-mcp[mcp]'
scholar-mcp serve
With Docker
docker run -v scholar-mcp-data:/data/scholar-mcp \
ghcr.io/pvliesdonk/scholar-mcp:latest
Linux packages
Download .deb or .rpm from the latest release:
# Debian/Ubuntu
sudo dpkg -i scholar-mcp_*.deb
# RHEL/Fedora
sudo rpm -i scholar-mcp-*.rpm
Note: The PyPI package is
pvliesdonk-scholar-mcp. The CLI command installed isscholar-mcp.
Quick Start
stdio transport (Claude Desktop / MCP clients)
uvx --from pvliesdonk-scholar-mcp scholar-mcp serve
API key optional but recommended: The server works without a Semantic Scholar API key, but unauthenticated requests are limited to ~1 req/s and will hit 429 throttles quickly during multi-step operations like citation graph traversal. Request a free key to get ~10 req/s.
Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"scholar": {
"command": "uvx",
"args": ["--from", "pvliesdonk-scholar-mcp", "scholar-mcp", "serve"],
"env": {
"SCHOLAR_MCP_S2_API_KEY": "your-key"
}
}
}
}
HTTP transport
uvx --from pvliesdonk-scholar-mcp scholar-mcp serve --transport http --port 8000
Configuration
All settings are controlled via environment variables with the SCHOLAR_MCP_ prefix.
Core
| Variable | Default | Description |
|---|---|---|
SCHOLAR_MCP_S2_API_KEY | -- | Semantic Scholar API key (request one); optional but recommended for higher rate limits |
SCHOLAR_MCP_READ_ONLY | true | If true, write-tagged tools (fetch_paper_pdf, convert_pdf_to_markdown, fetch_and_convert) are hidden |
SCHOLAR_MCP_CACHE_DIR | /data/scholar-mcp | Directory for the SQLite cache database and downloaded PDFs |
SCHOLAR_MCP_CONTACT_EMAIL | -- | Included in the OpenAlex User-Agent for polite pool access (faster rate limits) |
SCHOLAR_MCP_LOG_LEVEL | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
PDF Conversion (optional)
| Variable | Default | Description |
|---|---|---|
SCHOLAR_MCP_DOCLING_URL | -- | Base URL of a running docling-serve instance (e.g. http://localhost:5001) |
SCHOLAR_MCP_VLM_API_URL | -- | OpenAI-compatible VLM endpoint for formula/figure-enriched PDF conversion |
SCHOLAR_MCP_VLM_API_KEY | -- | API key for the VLM endpoint |
SCHOLAR_MCP_VLM_MODEL | gpt-4o | Model name for VLM-enriched conversion |
Authentication (optional)
| Variable | Default | Description |
|---|---|---|
SCHOLAR_MCP_BEARER_TOKEN | -- | Static bearer token for HTTP transport authentication |
SCHOLAR_MCP_BASE_URL | -- | Public base URL, required for OIDC (e.g. https://mcp.example.com) |
SCHOLAR_MCP_OIDC_CONFIG_URL | -- | OIDC discovery endpoint URL |
SCHOLAR_MCP_OIDC_CLIENT_ID | -- | OIDC client ID |
SCHOLAR_MCP_OIDC_CLIENT_SECRET | -- | OIDC client secret |
SCHOLAR_MCP_OIDC_JWT_SIGNING_KEY | -- | JWT signing key; required on Linux/Docker to survive restarts (openssl rand -hex 32) |
MCP Tools
Search & Retrieval
| Tool | Description |
|---|---|
search_papers | Full-text search with year, venue, field-of-study, and citation-count filters. Returns up to 100 results with pagination. |
get_paper | Fetch full metadata for a single paper by DOI, S2 ID, arXiv ID, ACM ID, or PubMed ID. |
get_author | Fetch author profile with publications, or search by name. |
Citation Graph
| Tool | Description |
|---|---|
get_citations | Forward citations (papers that cite a given paper) with optional filters. |
get_references | Backward references (papers cited by a given paper). |
get_citation_graph | BFS traversal from seed papers, returning nodes + edges up to configurable depth. |
find_bridge_papers | Shortest citation path between two papers. |
Recommendations
| Tool | Description |
|---|---|
recommend_papers | Paper recommendations from 1--5 positive examples and optional negative examples. |
Utility
| Tool | Description |
|---|---|
batch_resolve | Resolve up to 100 identifiers to full metadata in one call, with OpenAlex fallback. |
enrich_paper | Augment S2 metadata with OpenAlex fields (affiliations, funders, OA status, concepts). |
PDF Conversion (requires docling-serve)
| Tool | Description |
|---|---|
fetch_paper_pdf | Download open-access PDF for a paper. |
convert_pdf_to_markdown | Convert a local PDF to Markdown via docling-serve. |
fetch_and_convert | Full pipeline: fetch OA PDF, convert to Markdown, return both. |
PDF tools are write-tagged and hidden when
SCHOLAR_MCP_READ_ONLY=true(the default).
Task Polling
| Tool | Description |
|---|---|
get_task_result | Poll for the result of a background task by ID. |
list_tasks | List all active background tasks. |
Long-running operations (PDF download/conversion) and rate-limited S2 requests return
{"queued": true, "task_id": "..."}immediately. Useget_task_resultto poll for the result.
Docker Compose
services:
scholar-mcp:
image: ghcr.io/pvliesdonk/scholar-mcp:latest
restart: unless-stopped
environment:
SCHOLAR_MCP_S2_API_KEY: "${SCHOLAR_MCP_S2_API_KEY}"
SCHOLAR_MCP_DOCLING_URL: "http://docling-serve:5001"
SCHOLAR_MCP_VLM_API_URL: "${VLM_API_URL:-}"
SCHOLAR_MCP_VLM_API_KEY: "${VLM_API_KEY:-}"
SCHOLAR_MCP_CACHE_DIR: "/data/scholar-mcp"
SCHOLAR_MCP_READ_ONLY: "false"
volumes:
- scholar-mcp-data:/data/scholar-mcp
labels:
- "traefik.enable=true"
- "traefik.http.routers.scholar-mcp.rule=Host(`scholar-mcp.yourdomain.com`)"
docling-serve:
image: ghcr.io/ds4sd/docling-serve:latest
restart: unless-stopped
volumes:
scholar-mcp-data:
Cache Management
# Show cache statistics (row counts, database size)
scholar-mcp cache stats
# Clear all cached data (preserves identifier aliases)
scholar-mcp cache clear
# Remove entries older than 30 days
scholar-mcp cache clear --older-than 30
# Override cache directory
scholar-mcp cache stats --cache-dir /path/to/cache
Development
# Install with dev and MCP dependencies
uv sync --extra dev --extra mcp
# Run tests
uv run pytest
# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/
# Type check
uv run mypy src/
# Build docs locally
uv sync --extra docs
uv run mkdocs serve
License
MIT