MCP Hub
Back to servers

amazon-q-web-scraper-mcp

A professional web scraping and testing MCP server optimized for AI visual analysis, supporting standard websites, React apps, and React Native web with context-efficient file-based outputs.

Forks
1
Tools
12
Updated
Dec 8, 2025

Enhanced Web Scraper MCP Server

A professional Model Context Protocol (MCP) server for web scraping, React app testing, and React Native web app inspection using Playwright. Fully backward compatible with regular websites and standard React applications.

🚀 Latest Improvements

  • 🔥 Context-Optimized Screenshots - Screenshots return only file paths and analysis text (no base64 data)
  • 📊 Enhanced Page Analysis - Detailed element counting, content structure analysis, and page state inspection
  • 🔍 Comprehensive Comparison Tools - Visual similarity analysis with layout, color, and typography detection
  • 💾 File-Based Output - All screenshots saved to /tmp/ with structured analysis data
  • 🎯 Smart Content Detection - Automatically detects empty states, loading indicators, and content availability
  • Enhanced Error Handling - Comprehensive input validation and error reporting
  • Optimized Performance - Reduced code duplication and improved efficiency
  • Standardized Timeouts - Configurable timeout constants for reliability
  • Professional Code Structure - ES6+ best practices and maintainable architecture

🔄 Backward Compatibility

This enhanced server maintains 100% compatibility with:

  • Regular websites (HTML, CSS, JavaScript)
  • Standard React applications (Create React App, Next.js, etc.)
  • Traditional web scraping workflows
  • Existing CSS selectors and interactions

Plus new enhanced support for:

  • 🆕 React Native web applications
  • 🆕 Expo web projects
  • 🆕 Mobile viewport emulation
  • 🆕 Advanced React component inspection

📋 Tools Overview

ToolPurposeBest For
take_screenshotContext-free screenshot captureVisual analysis, UI documentation
compare_screenshotsVisual UI comparison with semantic analysisUI replication, visual regression testing
scrape_pageUniversal web scrapingContent extraction, data collection
test_react_appReact app testing with mobile gesturesUI testing, interaction automation
get_page_infoPage analysis with React insightsPerformance monitoring, framework detection
extract_contentClean content extractionDocumentation, article processing
wait_for_elementSmart element waitingDynamic content, loading states
inspect_react_appReact component analysisComponent debugging, state inspection
wait_for_react_stateReact state managementHydration, navigation, data loading
execute_in_react_contextJavaScript execution in React contextAdvanced debugging, custom scripts
check_expo_dev_serverExpo development server statusDevelopment workflow, debugging
duckduckgo_searchDuckDuckGo search with result extractionResearch, finding relevant URLs for content extraction

Key Features for AI Visual Analysis

🔥 Context-Free Design

  • No Base64 Data: Screenshots return only file paths and analysis text
  • Minimal Context Usage: Dramatically reduced token consumption per screenshot
  • File-Based Storage: All images saved to /tmp/ for external access
  • Structured Analysis: Rich text analysis without heavy image data

🔍 Smart Content Detection

  • Empty State Detection: Automatically identifies when pages have no meaningful content
  • Table Population Verification: Counts table rows to verify data is actually displaying
  • Loading State Recognition: Detects and waits for loading indicators to disappear
  • Content Structure Analysis: Provides detailed breakdown of page elements

📁 File-Based Output

Every visual tool provides:

  1. 📊 Analysis Text: Element counts, text content, structural analysis
  2. 📁 File Path: Saved screenshot location for external viewing
  3. 🎯 Pass/Fail Status: Built-in success criteria for automated workflows

🎯 Migration & Testing Support

Perfect for:

  • UI Migration Verification: Compare source vs target implementations
  • Mock Data Validation: Verify that mock data is actually displaying
  • Visual Regression Testing: Ensure UI changes don't break layouts
  • Component Testing: Validate React components render correctly

📊 Success Metrics Integration

  • Configurable Similarity Thresholds: Built-in pass/fail criteria for visual comparisons
  • Populated Data Requirements: Detects empty states that prevent meaningful comparison
  • Comprehensive Reporting: Detailed analysis for debugging visual differences

Available Tools

1. take_screenshot - Context-Free Screenshot Capture

Captures screenshots with comprehensive analysis while keeping context usage minimal.

{
  url: "https://example.com",
  browser: "chromium",
  device: "iPhone 12", // Optional device emulation
  fullPage: true,
  waitForSPA: true // Auto-detects and waits for React/Vue/Angular apps
}

Returns:

  • 📊 Comprehensive Analysis: Element counts, page structure, content preview
  • 📁 File Path: Screenshot saved to /tmp/screenshot-[timestamp].png
  • 🎯 Content Status: Pass/fail indicators for populated data

Example Output:

📸 Screenshot saved to: /tmp/screenshot-1234567890.png

📄 Page Analysis:
- Title: "My React App"
- Has Content: ✅
- Visible Elements: 247

📊 Content Elements:
- Headings: 3
- Paragraphs: 12
- Buttons: 8
- Tables: 1
- Table Rows: 15  ← Indicates populated data!

📝 Page Content Preview:
Welcome to our service platform. Here you can find contractors...

2. compare_screenshots - Context-Free Visual Comparison

Compares two pages with comprehensive analysis while maintaining minimal context usage.

{
  urlA: "https://source-design.com", // Source/reference
  urlB: "https://your-implementation.com", // Target/implementation
  browser: "chromium",
  threshold: 0.1, // Similarity threshold (0-1)
  analyzeLayout: true, // Detect alignment differences
  analyzeColors: true, // Exact color comparison
  analyzeTypography: true, // Font size/weight analysis
  waitForSPA: true // Smart SPA detection
}

Returns:

  • 📊 Visual Similarity Score: Percentage match with pass/fail status
  • 🏗️ Structural Comparison: Element counts, table rows, content structure
  • 🎨 Layout Analysis: Alignment differences, positioning issues
  • 📁 File Paths: Both screenshots saved to /tmp/ for external viewing

Example Output:

📸 Screenshots saved:
- Source: /tmp/compare-source-1234567890.png
- Target: /tmp/compare-target-1234567891.png

📊 VISUAL SIMILARITY: 87.3% ✅ PASS

🏗️ Structural Comparison:
- Tables: 1 → 1
- Table Rows: 0 → 8  ← Target has populated data!
- Buttons: 12 → 12

📋 Layout Analysis:
- 2 regions with significant layout differences
- Content appears centered in source but left-aligned in target

🎨 Color Analysis:
- Minor color differences detected
- Example: rgb(229, 122, 68) → rgb(225, 118, 64)

3. scrape_page - Universal Web Scraping

Works with any website - regular HTML, React apps, or React Native web.

Regular website example:

{
  url: "https://example.com",
  selector: ".article-title", // Standard CSS selector
  screenshot: true
}

React Native web example:

{
  url: "http://localhost:8081",
  selector: "login-button", // Will try testID, aria-label fallbacks
  mobileViewport: true,
  device: "iPhone 12"
}

4. test_react_app - Universal React Testing

Works with any React application - standard React or React Native web.

Standard React app example:

{
  url: "http://localhost:3000",
  waitForHydration: false, // Optional for regular React apps
  actions: [
    { type: "click", selector: "#submit-button" },
    { type: "fill", selector: "input[name='email']", value: "test@example.com" }
  ]
}

React Native web example:

{
  url: "http://localhost:8081",
  device: "iPhone 12",
  waitForHydration: true, // Recommended for RN web
  actions: [
    { type: "tap", selector: "login-button" },
    { type: "swipe", selector: "scroll-view", value: "up" }
  ]
}

5. get_page_info - Enhanced Page Analysis

Provides comprehensive information for any web page with React-specific insights.

{
  url: "https://any-website.com", // Works with any URL
  includePerformance: true
}

6. extract_content - Clean Content Extraction

Extract clean, readable content from web pages without HTML/CSS clutter. Perfect for documentation, articles, and structured content consumption.

{
  url: "https://docs.example.com/api-guide",
  includeLinks: true,    // Extract and categorize hyperlinks
  format: "markdown"     // Output format: 'markdown' or 'text'
}

Output Example:

# API Documentation

## Authentication
You need to obtain an API key [1] from the developer portal [2].

### Rate Limits
See the rate limiting guide [3] for details.

---
## Links Found:
[1] https://example.com/api-keys (internal)
[2] https://developer.example.com (external) 
[3] https://example.com/docs/rate-limits (internal)

Features:

  • Clean Structure - Preserves headings, paragraphs, lists, code blocks
  • Link Extraction - Categorizes links as internal, external, anchor, or download
  • Content Filtering - Removes navigation, ads, sidebars automatically
  • Multiple Formats - Markdown or plain text output

7. wait_for_element - Smart Element Waiting

Intelligent element waiting with automatic selector strategy fallbacks.

{
  url: "https://example.com",
  selector: ".loading-spinner", // CSS selector with RN fallbacks
  timeout: 10000
}

React Native Web Specific Tools

8. inspect_react_app - React Component Analysis

Deep inspection of React applications (works best with React Native web).

9. wait_for_react_state - React State Management

Wait for React-specific conditions like hydration, navigation, data loading.

10. execute_in_react_context - JavaScript Execution

Execute JavaScript in React context for advanced inspection.

11. check_expo_dev_server - Expo Development Tools

Check Expo/Metro bundler status for development workflows.

Selector Strategy Priority

The server uses intelligent selector strategies:

  1. Primary: Direct CSS selector (e.g., #button, .class, input[name='email'])
  2. Fallback 1: TestID attribute ([data-testid="button"])
  3. Fallback 2: Accessibility label ([aria-label="Button"])
  4. Fallback 3: AccessibilityLabel ([accessibilityLabel="Button"])

This ensures regular CSS selectors work normally while providing React Native web compatibility.

Usage Examples

Context-Free Visual Verification

// Verify data is actually displaying without burning context
{
  url: "http://localhost:3000/data-table",
  fullPage: true,
  waitForSPA: true
}
// Returns: File path + "Table Rows: 8" ← Confirms data is populated!

Context-Free Migration Comparison

// Compare source vs target implementation efficiently
{
  urlA: "http://localhost:3001/page", // Source
  urlB: "http://localhost:3000/page", // Target
  threshold: 0.05, // High similarity requirement
  analyzeLayout: true,
  analyzeColors: true
}
// Returns: File paths + "VISUAL SIMILARITY: 96.2% ✅ PASS"

Regular Website Scraping

// Works exactly like before
{
  url: "https://news.ycombinator.com",
  selector: ".storylink",
  screenshot: false
}

Standard React App Testing

// Standard React app (Create React App, Next.js, etc.)
{
  url: "http://localhost:3000",
  actions: [
    { type: "click", selector: "button.login" },
    { type: "fill", selector: "#username", value: "testuser" }
  ]
}

React Native Web App Testing

// React Native web with enhanced features
{
  url: "http://localhost:8081",
  device: "iPhone 12",
  waitForHydration: true,
  actions: [
    { type: "tap", selector: "login-button" }, // Uses testID
    { type: "swipe", selector: "scroll-view", value: "up" }
  ]
}

Clean Content Extraction

// Extract clean content from documentation
{
  url: "https://docs.react.dev/learn",
  includeLinks: true,
  format: "markdown"
}

Installation

npm install
npx playwright install

Usage with Amazon Q Developer

# Take a context-free screenshot and analyze content
q chat "Take a screenshot of localhost:3000/data-page and analyze the content"

# Compare pages efficiently without context bloat
q chat "Compare the page between localhost:3001 and localhost:3000"

# Mock data verification with minimal context usage
q chat "Verify that the data table is populated at localhost:3000"

# Works with any website
q chat "Scrape the headlines from https://news.ycombinator.com"

# Works with React apps
q chat "Test the login flow on my React app at localhost:3000"

# Enhanced React Native web support
q chat "Inspect the React Native web app at localhost:8081"

# Extract clean content for reading
q chat "Extract the main content from https://docs.react.dev/learn"

Benefits of Context-Free Design

🔥 Dramatically Reduced Context Usage

  • Before: 50-200KB base64 data per screenshot
  • After: Only text analysis (~1-2KB per screenshot)
  • Result: 50-100x reduction in context consumption

📁 File-Based Workflow

  • Screenshots saved to /tmp/ with timestamps
  • External tools can access images directly
  • No context pollution from image data
  • Structured analysis data remains in conversation

🎯 Better AI Workflows

  • More screenshots possible per conversation
  • Focus on analysis rather than data transfer
  • Cleaner conversation history
  • Faster response times

Troubleshooting

Error Handling

  • Input Validation - Server validates required parameters and provides clear error messages
  • Timeout Configuration - Default timeouts are optimized but can be adjusted per request
  • Browser Cleanup - Automatic resource cleanup prevents memory leaks

Regular Websites

  • Use standard CSS selectors (.class, #id, tag[attribute])
  • Set mobileViewport: false (default) for desktop sites
  • Set waitForHydration: false (default) for non-React sites

React Applications

  • Set waitForHydration: true for better reliability
  • Use semantic selectors when possible
  • Check browser console for React errors

React Native Web

  • Use testID attributes in your components
  • Enable mobileViewport or specify device
  • Set waitForHydration: true
  • Use inspect_react_app to see available elements

License

MIT


12. duckduckgo_search - Web Search Integration

Search DuckDuckGo and extract result links with titles and snippets for further content extraction.

Parameters

  • query (required): Search query string
  • maxResults (optional): Maximum results to return (1-10, default: 5)

Example Usage

# Search for React documentation
q chat "Search DuckDuckGo for 'React hooks documentation'"

# Get more results
q chat "Search DuckDuckGo for 'Node.js best practices' with maxResults=8"

# Combine with content extraction
q chat "Search for 'AWS Lambda tutorials' then extract content from the top 2 results"

Use Cases

  • Research: Find relevant URLs for content extraction
  • Documentation Discovery: Locate official docs and tutorials
  • Content Pipeline: Search → Extract → Analyze workflow
  • Development Research: Find code examples and solutions

Integration with Other Tools

Perfect for combining with extract_content:

  1. Use duckduckgo_search to find relevant URLs
  2. Use extract_content on the top results
  3. Get comprehensive information on any topic

Why DuckDuckGo?

  • No Bot Detection: More lenient than Google for automated requests
  • Free & Unlimited: No API keys or rate limits required
  • Privacy-Focused: Doesn't track users or requests
  • Reliable Results: High-quality search results for development topics

Note: This tool extracts public search results only (completely legal). DuckDuckGo is more automation-friendly than Google, providing reliable results without anti-bot measures.

Reviews

No reviews yet

Sign in to write a review