spa-reader-mcp

MCP server that renders JavaScript SPA pages and extracts LLM-ready Markdown.

Traditional web scrapers fail on Single Page Applications because content is rendered by JavaScript after page load. spa-reader-mcp solves this by launching a headless Chromium browser via Playwright, waiting for the page to fully render, then extracting clean Markdown using Mozilla's Readability and Turndown — ready for LLM consumption.

Features

spa_read — Render any SPA page and extract article content as clean Markdown with optional YAML frontmatter
spa_screenshot — Capture full or viewport-sized PNG screenshots of rendered pages
Singleton browser — Reuses a single Chromium instance across requests for fast, low-overhead rendering
SSRF protection — Blocks private/loopback IP ranges and restricts URL schemes to http/https
Selector injection prevention — Rejects Playwright-specific selector syntax (>>, nth=, text=, has-text, :has)
Content truncation — Caps output at 100KB with clean line-boundary truncation

Requirements

Node.js >= 20
Chromium browser for Playwright:
```
npx playwright install chromium
```

Installation

npx (recommended, zero install)

No global install needed — configure directly in your MCP client (see MCP Configuration).

Global install

npm install -g spa-reader-mcp
npx playwright install chromium

From source

git clone https://github.com/XXO47OXX/spa-reader-mcp.git
cd spa-reader-mcp
pnpm install
pnpm build
npx playwright install chromium

MCP Configuration

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "spa-reader": {
      "command": "npx",
      "args": ["-y", "spa-reader-mcp"]
    }
  }
}

Claude Code

claude mcp add spa-reader -- npx -y spa-reader-mcp

Tools

`spa_read`

Render a JavaScript SPA page and extract its content as LLM-ready Markdown.

Parameter	Type	Required	Default	Description
`url`	string	Yes	—	The URL of the SPA page to read
`waitForSelector`	string	No	—	CSS selector to wait for before extraction
`waitTimeout`	number	No	30000	Navigation timeout in ms (1000–120000)
`includeMetadata`	boolean	No	true	Include title/author/excerpt as YAML frontmatter

Example output:

---
title: "Understanding React Server Components"
author: "Dan Abramov"
excerpt: "A deep dive into how RSC works under the hood"
source: "https://example.com/blog/rsc"
---

## Introduction

React Server Components allow you to...

`spa_screenshot`

Take a PNG screenshot of a JavaScript SPA page after rendering.

Parameter	Type	Required	Default	Description
`url`	string	Yes	—	The URL to screenshot
`waitForSelector`	string	No	—	CSS selector to wait for before capturing
`waitTimeout`	number	No	30000	Navigation timeout in ms (1000–120000)
`width`	number	No	1280	Viewport width in pixels (320–3840)
`height`	number	No	720	Viewport height in pixels (240–2160)
`fullPage`	boolean	No	false	Capture full scrollable page

Returns the screenshot as a base64-encoded PNG image.

Architecture

URL
 → Playwright Chromium (headless, singleton)
   → Per-request BrowserContext (isolated cookies/storage)
     → Page navigation + networkidle + optional selector wait
       → Raw HTML
         → Mozilla Readability (article extraction)
           → Turndown (HTML → Markdown)
             → YAML frontmatter + truncation
               → LLM-ready Markdown

Key design decisions:

Singleton browser: A single Chromium instance is launched on first request and reused. This avoids the ~2s cold-start penalty on subsequent calls.
Per-request BrowserContext: Each request gets an isolated BrowserContext with its own cookies and storage, preventing cross-request data leakage.
Readability fallback: If Mozilla Readability determines the page isn't article-like, the extractor falls back to converting the full <body> HTML.

Security

Protection	Details
SSRF prevention	Blocks `localhost`, `127.x.x.x`, `10.x.x.x`, `172.16-31.x.x`, `192.168.x.x`, `::1`, `fe80:`, `169.254.x.x`, `0.0.0.0`
Scheme whitelist	Only `http:` and `https:` URLs are allowed
Selector injection	Rejects Playwright engine syntax: `>>`, `nth=`, `text=`, `has-text`, `:has()`
Content truncation	Output capped at 100KB with clean line-boundary cut
Test bypass	Set `SPA_READER_ALLOW_PRIVATE=1` to allow private IPs (for local development/testing only)

Development

# Install dependencies
pnpm install

# Build
pnpm build

# Run tests
pnpm test

# Type check
pnpm lint

spa-reader-mcp

spa-reader-mcp

Features

Requirements

Installation

npx (recommended, zero install)

Global install

From source

MCP Configuration

Claude Desktop

Claude Code

Tools

`spa_read`

`spa_screenshot`

Architecture

Security

Development

License

Reviews