MCP Hub
Back to servers

Web Scraper

A robust web scraping server that utilizes headless Chrome to extract content from modern JavaScript-heavy websites and single-page applications, optimized for LLM processing by converting to Markdown or clean HTML.

Stars
2
Updated
Jul 20, 2025
Validated
Jan 11, 2026

mcp-web-scraper

This package uses Google Chrome's headless APIs to scrape web pages for AI/LLM agents.

Because it uses Chrome as its default user agent, any sites that require Javascript (for example, single page applications) should also be parsable with this tool.

It supports being called either from Go (go lang) via LangChainGo, or as an MCP server.

MCP Server

First compile the code using go:

go build .

Claude Desktop

{
  "mcpServers": {
    "mcp-web-scraper": {
      "command": "/path/to/mcp-web-scraper",
      "args": []
    }
  }
}

Visual Studio Code

{
  "mcp": {
    "servers": {
      "mcp-web-scraper": {
        "command": "/path/to/mcp-web-scraper",
        "args": []
      }
    }
  }
}

LangChainGo tool

Integration into langchain is easy:

import 	"github.com/lmorg/mcp-web-scraper/langchain"

func example() {
    scraper := langchain.NewScraper()
}

Please consult the langchaingo docs for how to use tools with their libraries.

Fallback Modes

If Google Chrome is not installed

If you do not have Google Chrome installed, then mcp-web-scraper will fallback to use Go's HTTP user agent.

This will work in the majority of cases, however you might not get any content for sites that requires Javascript to render.

Markdown Support

By default this module will look for <article> and convert that to Markdown.

If either the page doesn't present itself as an article of some description (eg not a blog, technical documentation, etc) then this module will fallback to returning HTML.

Any HTML document returned will have specific HTML tags removed (such as <script>, <svg>, and HTML comments) to reduce the tokens required for the LLM to parse

Reviews

No reviews yet

Sign in to write a review