MCP Hub
Back to servers

MCP Test Utils

Desktop UI automation for AI agents: screenshots, windows, mouse, keyboard, UI Automation, OCR

Registry
Updated
Mar 24, 2026

MCP Test Utils

100% AI Code · Human Reviewed

version: 3.6.0 tools: 16 AI generated: 100% Support

MCP server for automated desktop UI testing. A single binary — no runtime, no dependencies, no installation.

Windows x64 only. macOS and Linux support is planned.

Gives AI agents eyes and hands: screenshots, window management, mouse, keyboard, UI Automation, OCR.

Why

AI agents can trigger actions in applications but can't see the screen. This server bridges that gap:

Agent triggers action → takes screenshot → sees the result →
switches window → clicks a button → verifies → writes report

Fully autonomous, no user involvement required.

Platforms

PlatformStatus
Windows x64✅ Full support
macOS arm64⏳ Planned
Linux x64⏳ Planned

Tools (16)

Vision

ToolDescription
take_screenshotScreenshot of the entire desktop with configurable quality
take_window_screenshotScreenshot of a specific window (screen or window capture mode)
read_screen_textOCR the entire screen (Windows.Media.Ocr)
read_region_textOCR a screen region with precise word coordinates

Window Management

ToolDescription
list_windowsList windows with id, title, app, position, size, minimized, focused
focus_windowBring a window to front, restore if minimized

Input

ToolDescription
mouse_clickClick (left / right / middle) at screen or window-relative coordinates
mouse_moveMove cursor to a point
mouse_dragDrag from point A to point B
mouse_scrollScroll the mouse wheel
keyboard_typeType text (full Unicode — Latin, Cyrillic, CJK, emoji)
keyboard_pressPress a key (Enter, Tab, F1–F12, arrows, etc.)
keyboard_shortcutKey combinations (Ctrl+S, Alt+F4, Ctrl+Shift+P, etc.)

Structured UI Access

ToolDescription
list_ui_elementsUI Automation tree — buttons, fields, menus with exact coordinates

Session Logging

ToolDescription
enable_loggingStart recording tool calls to JSONL + screenshots (opt-in)
disable_loggingStop recording, get session stats

Installation

  1. Download the binary from Releases.
  2. Add it to your MCP client config. Example below is for Claude Desktop — for other clients, refer to their documentation.

Claude Desktop: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "test-utils": {
      "command": "D:\\path\\to\\mcp-test-utils.exe"
    }
  }
}
  1. Restart Claude Desktop.
  2. In chat, try: "Take a screenshot" — the agent will return an image of your desktop.

With Logging (optional)

{
  "mcpServers": {
    "test-utils": {
      "command": "D:\\path\\to\\mcp-test-utils.exe",
      "env": {
        "MCP_LOG_DIR": "D:\\path\\to\\logs",
        "MCP_LOG_MAX_MB": "500",
        "MCP_LOG_RETAIN_DAYS": "30"
      }
    }
  }
}

Quality Presets

Screenshots support configurable quality to balance detail and token cost:

PresetScaleFormatUse Case
full100%JPEG q90Maximum detail
standard50%JPEG q70Balanced (default)
compact50%PNGWhen PNG is needed
minimal25%GrayscaleLowest token cost
custom10–100%JPEG / PNG / GrayscaleFull control

Environment Variables

VariableDescriptionDefault
MCP_LOG_DIRPath for log sessions. Without it, logging tools are hidden
MCP_LOG_MAX_MBSession size limit (warning on exceed)500
MCP_LOG_RETAIN_DAYSAuto-delete sessions older than N days. 0 to disable30

How It Works

MCP Test Utils is a JSON-RPC 2.0 server communicating over stdin/stdout. Any MCP-compatible client launches the binary, sends tool calls, and receives structured responses (text, base64 images). Tested with Claude Desktop.

The server uses native Windows APIs directly — Win32 GDI for screenshots, SendInput for mouse and keyboard, UI Automation COM API for element inspection, WinRT Windows.Media.Ocr for text recognition. No PowerShell, no external tools, no network access.

Use Cases

  • Automated QA — agent navigates the app, clicks through flows, takes screenshots at each step, writes a test report
  • Desktop automation — fill forms, copy data between windows, run workflows
  • Accessibility audit — scan UI Automation tree for missing labels or roles
  • Visual regression — screenshot comparison across releases
  • Data extraction — OCR text from applications that don't expose APIs

Security

  • Responds only to requests from the MCP client
  • Opens no network ports
  • Writes nothing to disk (except opt-in logging)
  • Sends no data externally
  • Screenshots capture the entire screen — make sure no sensitive information is visible

Support

Free and unrestricted. If you find it useful — jeenyjai.github.io

License

Copyright 2026 JeenyJAI. All rights reserved.


🚀 Created with Claude

Reviews

No reviews yet

Sign in to write a review