MCP Hub
Back to servers

atlas-browser-mcp

A visual web browsing server for AI agents that uses screenshot-based navigation and Set-of-Mark (SoM) labeling to interact with web elements humanistically.

Tools
6
Updated
Dec 13, 2025

🌐 atlas-browser-mcp

Visual web browsing for AI agents via Model Context Protocol (MCP).

PyPI version License: MIT

✨ Features

  • 📸 Visual-First: Navigate the web through screenshots, not DOM parsing
  • 🏷️ Set-of-Mark: Interactive elements labeled with clickable [0], [1], [2]... markers
  • 🎭 Humanized: Bezier curve mouse movements, natural typing rhythms
  • 🧩 CAPTCHA-Ready: Multi-click support for image selection challenges
  • 🛡️ Anti-Detection: Built-in measures to avoid bot detection

🚀 Quick Start

Installation

pip install atlas-browser-mcp
playwright install chromium

Use with Claude Desktop

Add to your Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "browser": {
      "command": "atlas-browser-mcp"
    }
  }
}

Then ask Claude:

"Navigate to https://news.ycombinator.com and tell me the top 3 stories"

🛠️ Available Tools

ToolDescription
navigateGo to URL, returns labeled screenshot
screenshotCapture current page with labels
clickClick element by label ID [N]
multi_clickClick multiple elements (for CAPTCHA)
typeType text, optionally press Enter
scrollScroll page up or down

📖 Usage Examples

Basic Navigation

User: Go to google.com
AI: [calls navigate(url="https://google.com")]
AI: I see the Google homepage. The search box is labeled [3].

User: Search for "MCP protocol"
AI: [calls click(label_id=3)]
AI: [calls type(text="MCP protocol", submit=true)]
AI: Here are the search results...

CAPTCHA Handling

User: Select all images with traffic lights
AI: [Looking at the CAPTCHA grid]
AI: I can see traffic lights in images [2], [5], and [8].
AI: [calls multi_click(label_ids=[2, 5, 8])]

🔧 Configuration

Headless Mode

For servers without display:

from atlas_browser_mcp.browser import VisualBrowser

browser = VisualBrowser(
    headless=True,   # No visible browser window
    humanize=False   # Faster, less human-like
)

Custom Viewport

browser = VisualBrowser()
browser.VIEWPORT = {"width": 1920, "height": 1080}

🏗️ How It Works

  1. Navigate: Browser loads the page
  2. Inject SoM: JavaScript labels all interactive elements
  3. Screenshot: Capture the labeled page
  4. AI Sees: The screenshot shows [0], [1], [2]... on buttons, links, inputs
  5. AI Acts: "Click [5]" → Browser clicks the element at that position
  6. Repeat: New screenshot with updated labels
┌─────────────────────────────────────┐
│  [0] Logo    [1] Search   [2] Menu  │
│                                     │
│  [3] Article Title                  │
│  [4] Read More                      │
│                                     │
│  [5] Subscribe    [6] Share         │
└─────────────────────────────────────┘

🤝 Integration

With Cline (VS Code)

{
  "mcpServers": {
    "browser": {
      "command": "atlas-browser-mcp"
    }
  }
}

Programmatic Use

from atlas_browser_mcp.browser import VisualBrowser

browser = VisualBrowser()

# Navigate
result = browser.execute("navigate", url="https://example.com")
print(f"Page title: {result.data['title']}")
print(f"Found {result.data['element_count']} interactive elements")

# Click element [0]
result = browser.execute("click", label_id=0)

# Type in focused field
result = browser.execute("type", text="Hello world", submit=True)

# Cleanup
browser.execute("close")

📋 Requirements

  • Python 3.10+
  • Playwright with Chromium

🐛 Troubleshooting

"Playwright not installed"

pip install playwright
playwright install chromium

"Browser closed unexpectedly"

Try running with headless=False to see what's happening:

browser = VisualBrowser(headless=False)

Elements not being detected

Some dynamic pages need more wait time. The browser waits 1.5s after navigation, but complex SPAs may need longer.

📄 License

MIT License - see LICENSE

🙏 Credits

Built for Atlas, an autonomous AI agent.

Inspired by:

Reviews

No reviews yet

Sign in to write a review