MCP Hub
Back to servers

Web Crawler

MCP (Model Context Protocol) Server. Configurable web crawler that extracts structured content from websites while respecting robots.txt rules and offering customizable settings for depth, delay, and concurrency.

Stars
1
Tools
1
Validated
Jan 11, 2026

Web Crawler MCP Server Deployment Guide

Prerequisites

  • Node.js (v18+)
  • npm (v9+)

Installation

  1. Clone the repository:

    git clone https://github.com/jitsmaster/web-crawler-mcp.git
    cd web-crawler-mcp
    
  2. Install dependencies:

    npm install
    
  3. Build the project:

    npm run build
    

Configuration

Create a .env file with the following environment variables:

CRAWL_LINKS=false
MAX_DEPTH=3
REQUEST_DELAY=1000
TIMEOUT=5000
MAX_CONCURRENT=5

Running the Server

Start the MCP server:

npm start

MCP Configuration

Add the following to your MCP settings file:

{
  "mcpServers": {
    "web-crawler": {
      "command": "node",
      "args": ["/path/to/web-crawler/build/index.js"],
      "env": {
        "CRAWL_LINKS": "false",
        "MAX_DEPTH": "3",
        "REQUEST_DELAY": "1000",
        "TIMEOUT": "5000",
        "MAX_CONCURRENT": "5"
      }
    }
  }
}

Usage

The server provides a crawl tool that can be accessed through MCP. Example usage:

{
  "url": "https://example.com",
  "depth": 1
}

Configuration Options

Environment VariableDefaultDescription
CRAWL_LINKSfalseWhether to follow links
MAX_DEPTH3Maximum crawl depth
REQUEST_DELAY1000Delay between requests (ms)
TIMEOUT5000Request timeout (ms)
MAX_CONCURRENT5Maximum concurrent requests

Reviews

No reviews yet

Sign in to write a review