MCP Hub
Back to servers

Screen Agent

Enables AI assistants to capture screenshots and control desktop input (mouse, keyboard) to see and interact with your screen. Features user-first safety controls including automatic pause on user activity and app allowlists to restrict interactions to approved applications only.

glama
Updated
Apr 5, 2026

Screen Agent

Give AI coding tools eyes and hands.

An MCP server that lets Claude Code, Cursor, and other AI tools see your screen and interact with your desktop.

Why?

AI coding assistants are powerful but blind — they can edit files and run commands, but they can't see what's on your screen. Screen Agent fixes that by providing screen capture and desktop interaction as MCP tools.

You: "The form in the browser has a bug — can you see it?"
Claude: [captures screen] I see the registration form. The email
        validation shows an error even though the format is correct.
        The regex pattern in validators.ts is too restrictive...

Install

pip install screen-agent

Quick Start

Use with Claude Code

  1. Add to your MCP config (~/.claude/mcp.json or .mcp.json):
{
  "mcpServers": {
    "screen": {
      "command": "screen-agent",
      "args": ["serve"]
    }
  }
}
  1. Restart Claude Code. That's it — Claude can now see your screen.

Use as Python library

import asyncio
from screen_agent import capture_screen, mouse_click, keyboard_type

async def main():
    screenshot = await capture_screen()
    print(f"Captured {screenshot['width']}x{screenshot['height']}px")

    await mouse_click(400, 300)
    await keyboard_type("Hello from screen-agent!")

asyncio.run(main())

Tools

ToolDescription
capture_screenScreenshot the full screen or a region
clickClick at screen coordinates
type_textType text at cursor position
press_keyPress key / key combo (e.g. Cmd+C)
scrollScroll up or down
move_mouseMove cursor
dragClick and drag
get_cursor_positionGet cursor coordinates
list_windowsList visible windows
focus_windowFocus a window by title
get_active_windowGet active window info

All input tools (click, type_text, press_key, scroll, move_mouse, drag, focus_window) support an optional verify: true parameter that captures a screenshot after the action, so the LLM can confirm it worked.

Optional: OCR Plugin

pip install screen-agent[ocr]

Adds three more tools:

ToolDescription
ocrExtract all screen text with positions
find_textFind text on screen and get coordinates
click_textFind text and click its center (OCR + click in one step)

Safety: Input Guardian

Screen Agent is designed with user-first safety:

User always has priority. The moment you touch your keyboard or mouse, the agent pauses instantly. It only resumes after you've been idle for 1.5 seconds (configurable). The agent never fights you for control.

App allowlist. The agent must declare which apps it needs access to. It can only interact with apps on the list. Need to work across Chrome and Figma? Just add both.

Claude: [calls add_app("Chrome")]
        [calls add_app("Figma")]
        I can now operate in Chrome and Figma.

        [clicks in Chrome]      ← allowed
        [clicks in Figma]       ← allowed
        [clicks in Slack]       ← rejected, not on the list

User:   *moves mouse*
Claude: [paused — waiting for user to finish]
        ...user stops...
Claude: [resumes after 1.5s idle] Continuing where I left off.
Safety ToolDescription
add_appAdd an app to the allowed list (e.g. "Chrome", "Figma")
remove_appRemove an app from the allowed list
set_regionRestrict to a pixel region on screen
clear_scopeRemove all restrictions
get_agent_statusCheck guardian state, user activity, allowed apps

Platform Support

ScreenshotInput ControlWindow Management
macOSmsspyautoguiAppleScript
Linuxmsspyautoguiwmctrl
WindowsmsspyautoguiPlanned

macOS Permissions

Screen Agent needs two permissions on macOS:

  • Screen Recording — for screenshots
  • Accessibility — for keyboard/mouse control

Grant them in: System Settings → Privacy & Security

Architecture

┌──────────────────────────────────────────────┐
│  MCP Client (Claude Code / Cursor / etc.)    │
└──────────────┬───────────────────────────────┘
               │  MCP Protocol (stdio/SSE)
               ▼
┌──────────────────────────────────────────────┐
│  Screen Agent MCP Server                     │
│                                              │
│  ┌────────────────────────────────────────┐  │
│  │  Input Guardian (pynput)               │  │
│  │  • Monitors keyboard + mouse globally  │  │
│  │  • User active? → PAUSE all actions    │  │
│  │  • Scope lock → reject out-of-bounds   │  │
│  └────────────────────────────────────────┘  │
│       │ clearance granted                    │
│       ▼                                      │
│  capture.py  ─  mss (cross-platform)         │
│  input.py    ─  pyautogui                    │
│  window.py   ─  AppleScript / wmctrl         │
│  plugins/    ─  OCR, CV (optional)           │
└──────────────────────────────────────────────┘

Configuration

Transport modes

# stdio (default) — for Claude Code and most MCP clients
screen-agent serve

# SSE — for HTTP-based clients
screen-agent serve --transport sse --port 8765

System check

screen-agent check

Verifies all dependencies and platform permissions.

Development

git clone https://github.com/chriswu727/screen-agent.git
cd screen-agent
pip install -e ".[dev]"
pytest

License

MIT

Reviews

No reviews yet

Sign in to write a review