molmoweb-mcp
MCP server that exposes MolmoWeb web automation as tools for Claude (or any MCP client). Uses Playwright for browser control.
Architecture
Claude / MCP Client
↓ stdio (MCP protocol)
molmoweb-mcp (this server)
↓ ↓
Playwright browser MolmoWeb API (localhost:8001)
Tools
| Tool | Description |
|---|---|
molmoweb_check_status | Health check for MolmoWeb backend |
browser_navigate | Open URL in Playwright browser |
browser_screenshot | Capture JPEG screenshot (returns base64 image) |
browser_get_page_info | Get current URL and title |
browser_execute_action | Execute click/type/scroll/press_key/hover/navigate/wait |
molmoweb_predict | Ask MolmoWeb vision model what action to perform |
run_web_task | Full autonomous agent loop (orchestrator + MolmoWeb + execution) |
Setup
npm install
npx playwright install chromium
Start the MolmoWeb backend
The MolmoWeb vision model must be running at http://127.0.0.1:8001. On Windows with WSL:
# Using the provided script:
run_molmoweb.bat
Configure in Claude Code
Add to your ~/.mcp.json (global) or project .mcp.json:
{
"mcpServers": {
"molmoweb": {
"command": "node",
"args": ["/path/to/molmoweb-mcp/server.js"]
}
}
}
Run standalone
npm start
Orchestrator LLM Support
The run_web_task tool uses an LLM orchestrator to decompose tasks into step-by-step browser actions. Supported providers:
- OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4
- Anthropic: claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
- Custom: Any OpenAI-compatible endpoint (e.g., Ollama)
How It Works
- User provides a high-level task (e.g., "Search Google for AI news")
- The orchestrator LLM decomposes it into atomic browser instructions
- MolmoWeb vision model translates each instruction into pixel-level actions
- Playwright executes the actions in a visible Chromium browser
- Loop repeats until the task is complete or max steps reached
License
MIT