MCP Hub
Back to servers

winapp-mcp

Validation Failed

MCP server for automating native Windows desktop apps — Playwright for Windows. Gives AI assistants (Copilot, Claude, Cursor) full control over WinUI3, WPF, WinForms, and Win32 apps with 55 UI automation tools. Built entirely with GitHub Copilot.

npm208/wk
Stars
13
Forks
2
Updated
Mar 27, 2026
Validated
Apr 5, 2026

Validation Error:

Process exited with code 1. stderr: npm error code EBADPLATFORM npm error notsup Unsupported platform for winapp-mcp@2.0.0: wanted {"os":"win32"} (current: {"os":"linux"}) npm error notsup Valid os: win32 npm error notsup Actual os: linux npm error A complete log of this run can be found in: /home/runner/.npm/_logs/2026-04-05T03_02_19_639Z-debug-0.log

Quick Install

npx -y winapp-mcp

WinApp MCP

WinApp MCP

Website · GitHub · npm · VS Code Marketplace

Playwright for Windows Desktop Apps

A Model Context Protocol (MCP) server that gives AI assistants full control over native Windows applications — launch, inspect, click, type, screenshot, and test any WinUI3, WPF, WinForms, UWP, or Win32 app.

npm Latest Release Stars Downloads Install 55 Tools .NET 10 MCP 1.1.0 License Windows Built with GitHub Copilot


Why This Exists

Browser automation has Playwright. Mobile has Appium. But Windows desktop automation for AI agents? The existing options didn't go deep enough.

AI assistants like GitHub Copilot, Claude, and ChatGPT can browse the web, run terminal commands, and edit files. But they cannot interact with native Windows applications. They can't click a button in your WinUI3 app, read a value from a WPF form, or test a WinForms dialog.

WinApp MCP bridges that gap. It exposes 55 UI automation tools through the Model Context Protocol, letting any MCP-compatible AI assistant control Windows desktop apps the same way Playwright controls browsers.

The Problem

What AI Can Do TodayWhat AI Couldn't Do
✅ Edit source code❌ Launch and interact with the compiled app
✅ Run terminal commands❌ Click buttons, fill forms, read UI state
✅ Browse websites via Playwright❌ Automate desktop apps
✅ Write testsRun visual/E2E tests on desktop apps
✅ Run long autonomous pipelinesKeep working when the app is minimized or the screen locks

The Solution

WinApp MCP gives AI assistants eyes and hands for any Windows desktop application:

┌─────────────────────┐      MCP (stdio)      ┌──────────────────────┐
│   AI Assistant       │◄─────────────────────►│   WinApp MCP Server  │
│   (Copilot, Claude)  │   JSON-RPC over stdio │   (.NET 10 + FlaUI)  │
└─────────────────────┘                        └──────────┬───────────┘
                                                          │ UI Automation
                                                          ▼
                                               ┌──────────────────────┐
                                               │  Windows Application │
                                               │  WinUI3/WPF/WinForms│
                                               └──────────────────────┘

Key Features

🔍 Deep UI Inspection

  • DOM-like snapshots of the entire UI element tree with configurable depth
  • Filtered search by control type, AutomationId, or name
  • Fuzzy matching with Levenshtein distance — tolerates typos and partial names
  • Element existence checks — fast boolean queries without full property reads

🖱️ Complete Interaction

  • Click, double-click, right-click, invoke (via UIA patterns)
  • Type text, press keys, key combos (Ctrl+S, Alt+F4, etc.)
  • Fill entire forms in a single call — no sequential typing
  • Select dropdown options in one atomic operation
  • Drag and drop between elements
  • Expand/collapse tree items, menus, and accordions

🪟 Multi-Window HWND Targeting

  • Target specific windows by native handle (HWND)
  • Click, set values, and take snapshots of popups, dialogs, and secondary windows
  • Essential for multi-window applications and system dialogs

⚡ Performance-Optimized

  • Descendant cache (2s TTL) — avoids repeated FindAllDescendants calls that take 200-800ms on complex apps
  • Window cache (30s TTL) — eliminates expensive window lookups
  • Smart cache invalidation — cache is cleared automatically after mutations (click, type, navigate)
  • Token-aware screenshots — auto-resize images to fit within LLM context limits

📊 Advanced UIA Patterns

  • GridPattern — direct cell access by row/column in DataGrids
  • ScrollItemPattern — scroll off-screen elements into view
  • VirtualizedItemPattern — realize items in WinUI3 virtualized lists (ListView, GridView)
  • ExpandCollapsePattern — programmatic expand/collapse for tree items and menus
  • ItemContainerPattern — efficient search in large/virtualized containers

📡 Event Monitoring

  • Monitor focus changes, structure changes (elements added/removed), and property changes in real-time
  • Session-based with 500-event ring buffer
  • Debug async UI updates, animations, and background data loads

📸 Visual Verification

  • Screenshot capture with auto-resize for LLM token budgets
  • Annotated screenshots — red bounding boxes drawn around specified elements
  • Pixel-diff comparison — compare two screenshots and highlight changes
  • HWND-targeted screenshots for specific windows

🪟 Minimized & Locked Session Support

  • Works when apps are minimized — auto-restores windows before mouse/keyboard operations, uses PrintWindow API for screenshots without needing to restore
  • Works when the desktop is locked — clicks via UIA patterns (InvokePattern, TogglePattern, SelectionItemPattern) instead of mouse simulation; reads/writes values via ValuePattern; captures screenshots via PrintWindow
  • Smart fallback chain — tries UIA pattern → auto-restore + mouse simulation → informative error
  • Session status detectioncheck_session_status reports whether the session is locked, the app is minimized, and which operations are available
  • Manual restorerestore_window brings a minimized app back to foreground on demand

Why it matters: AI agents running long autonomous pipelines shouldn't break because a screensaver kicked in or the user minimized the app. WinApp MCP degrades gracefully — UIA pattern-based operations keep working even when mouse/keyboard simulation can't.

🛡️ Safety

  • Emergency release — unstick all modifier keys and mouse buttons with one call
  • Wait primitives — wait for elements, conditions, and input idle states
  • Timeout controls on all blocking operations

Use Cases

🤖 AI-Powered E2E Testing

Let AI assistants test your Windows app the way a QA engineer would — navigate pages, fill forms, verify data, test status transitions, capture evidence.

"Navigate to Invoices → Create New → Fill all fields → Save → Verify detail page → Test status change to Sent"

🔄 Automated UI Verification

After code generation or refactoring, have the AI launch the app and verify that the UI renders correctly, buttons work, and forms validate properly.

🧪 Visual Regression Testing

Take baseline screenshots, make changes, take new screenshots, and use screenshot_diff to detect unexpected visual changes.

📋 Form Automation & Data Entry

Fill complex multi-field forms in a single fill_form call. Automate repetitive data entry across desktop applications.

🏗️ CI/CD Desktop Testing

Integrate into build pipelines to automatically test desktop applications after each build. The MCP server runs headless-compatible via stdio transport.

🔍 Accessibility Auditing

Inspect the UI Automation tree to verify that all controls have proper AutomationIds, names, and patterns — critical for screen reader compatibility.


Installation

Works with any MCP client — VS Code, Claude Desktop, Cursor, Windsurf, Cline, and more.

Option 1: npm (Works Everywhere) ⭐

The universal way to run WinApp MCP with any MCP client. No .NET SDK required.

npm install -g winapp-mcp

Or run directly without installing:

npx -y winapp-mcp

Then add to your MCP client config (see client-specific configs below).

Option 2: VS Code Extension (VSIX)

The easiest way for VS Code / GitHub Copilot users. Bundles the MCP server and auto-registers on install.

  1. Download the latest .vsix file from Releases
  2. In VS Code: Ctrl+Shift+P"Extensions: Install from VSIX..." → select the file
  3. Reload VS Code — the MCP server registers automatically

Note: Also available from the VS Code Extension Marketplace — search "WinApp MCP" in Extensions.

Option 3: Build from Source

git clone https://github.com/floatingbrij/desktop-pilot-mcp.git
cd desktop-pilot-mcp/src
dotnet build
dotnet run

Publish as self-contained executable (no .NET SDK needed on target):

dotnet publish -c Release -r win-x64 --self-contained
# Output: bin/Release/net10.0-windows10.0.19041.0/win-x64/publish/WinAppMCP.exe

Client Configurations

VS Code / GitHub Copilot

Option A — Install the VSIX (auto-registers, nothing to configure).

Option B — Add to .vscode/mcp.json in your workspace:

{
  "servers": {
    "winapp": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "winapp-mcp"]
    }
  }
}

Claude Desktop

Add to %APPDATA%\Claude\claude_desktop_config.json:

{
  "mcpServers": {
    "winapp": {
      "command": "npx",
      "args": ["-y", "winapp-mcp"]
    }
  }
}

Cursor

Add to Cursor Settings → MCP Servers, or in .cursor/mcp.json:

{
  "mcpServers": {
    "winapp": {
      "command": "npx",
      "args": ["-y", "winapp-mcp"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "winapp": {
      "command": "npx",
      "args": ["-y", "winapp-mcp"]
    }
  }
}

Cline (VS Code Extension)

Add via Cline MCP Settings or in cline_mcp_settings.json:

{
  "mcpServers": {
    "winapp": {
      "command": "npx",
      "args": ["-y", "winapp-mcp"]
    }
  }
}

Any Other MCP Client

WinApp MCP uses stdio transport (JSON-RPC over stdin/stdout). Point your client at:

npx -y winapp-mcp

Or directly at the executable:

path/to/WinAppMCP.exe

⚠️ Caution & Important Notes

This tool controls your mouse and keyboard. When automation is running, it will move your cursor, click buttons, and type text. Do not interact with your computer while tests are executing — it can interfere with the automation and produce unexpected results.

Minimized/locked mode: When the app is minimized or the session is locked, WinApp MCP automatically falls back to UIA patterns (InvokePattern, ValuePattern) and PrintWindow for screenshots — no mouse or keyboard simulation is used in this mode.

Before You Start

  • Close sensitive applications — The tool can list all visible windows and their titles. Keep private apps closed during automation sessions.
  • Save your work — Automation interacts with live applications. An incorrect click could close unsaved work in other apps.
  • Single session — Run one automation session at a time. Concurrent sessions can cause mouse/keyboard conflicts.

During Automation

  • Don't touch mouse/keyboard — Let the automation complete. Moving your mouse mid-operation can cause clicks to miss their targets.
  • Use release_all — If automation stops unexpectedly and your Ctrl/Shift/Alt keys feel "stuck", call release_all to reset.
  • Watch timeouts — All blocking operations have timeouts. If an element doesn't appear, the call will fail gracefully rather than hang.

Known Limitations

  • Windows only — Requires Windows 10 version 1903+ or Windows 11
  • UI Automation dependency — Target apps must expose a UIA tree. Fully custom-drawn apps (DirectX/OpenGL games, Electron with custom rendering) may have limited or no UIA support.
  • No cross-process clipboard writingget_clipboard reads clipboard content but doesn't write to it
  • Process-scoped — The server attaches to one app at a time per app ID. Use multiple app IDs for multi-app workflows.
  • Screen resolution — Screenshot coordinates are absolute. Automation on remote desks or varying DPI may need adjustment.
  • Administrator apps — If the target app runs as Administrator, the MCP server must also run elevated.
  • Single monitor optimized — Multi-monitor coordinate translation is not handled automatically
  • Locked session limits — When the desktop is locked (Win+L), operations using UIA patterns (click via Invoke, read/write values, take screenshots via PrintWindow) work, but raw mouse/keyboard simulation does not. The server auto-detects this and uses pattern-based fallbacks where possible.

Tools Overview

WinApp MCP exposes 55 tools organized into 10 categories:

App Lifecycle (5 tools)

ToolDescription
launch_appLaunch a Windows app by executable path
attach_to_appAttach to a running process by name
attach_to_pidAttach to a running process by PID
close_appClose a tracked application
list_appsList all currently tracked applications

Window & Element Discovery (7 tools)

ToolDescription
list_windowsList all windows of a tracked app
list_desktop_windowsList all visible desktop windows
get_snapshotGet a UI tree snapshot (like browser DOM inspection)
get_focused_elementGet the currently focused element's info
find_elementsSearch for elements with filters (type, id, name)
find_all_elementsList all matching elements with indices
find_elements_fuzzyFuzzy search — tolerates typos and partial names

Read & Inspect (5 tools)

ToolDescription
read_elementRead detailed properties of a UI element
read_element_by_indexRead properties by index (from find_all_elements)
get_element_boundsGet bounding rectangle (screen coordinates)
element_existsFast boolean check — does this element exist?
get_all_valuesRead all editable field values at once

Click & Interaction (8 tools)

ToolDescription
click_elementClick a UI element by AutomationId or name
double_click_elementDouble-click a UI element
right_click_elementRight-click (open context menu)
click_at_coordinatesClick at absolute screen coordinates
invoke_elementInvoke via UIA InvokePattern/TogglePattern
select_optionSelect a ComboBox/dropdown option in one call
expand_collapse_elementExpand, collapse, or toggle tree/menu items
drag_elementDrag from one element to another

Input (4 tools)

ToolDescription
type_textType text into a text field
press_keyPress a single key (RETURN, TAB, ESCAPE, etc.)
press_key_comboPress a key combination (Ctrl+S, Alt+F4)
fill_formFill multiple form fields in one call

Wait & Sync (4 tools)

ToolDescription
wait_for_elementWait for an element to appear
wait_for_conditionWait for a property to reach a value
wait_for_input_idleWait for window to be ready for input
release_allEmergency: release all stuck modifier keys

Screenshots & Visual (5 tools)

ToolDescription
take_screenshotCapture app window as PNG
take_screenshot_optimizedScreenshot with auto-resize for LLM token budgets
annotate_screenshotDraw red bounding boxes around elements
screenshot_diffPixel-diff two screenshots, highlight changes
get_tree_hashHash the UI tree to detect changes

HWND Multi-Window (3 tools)

ToolDescription
click_element_hwndClick in a specific window by handle
set_value_hwndSet text in a specific window by handle
get_snapshot_hwndUI snapshot of a specific window by handle

Advanced Patterns (6 tools)

ToolDescription
get_grid_itemAccess a DataGrid cell by row/column
find_item_by_propertySearch in containers (ItemContainerPattern)
scroll_into_viewScroll an element into the visible area
realize_virtualized_itemLoad a virtualized item into the UI tree
scroll_elementScroll within a scrollable container
invalidate_cacheForce-refresh cached window references

Event Monitoring (3 tools)

ToolDescription
start_event_monitorStart monitoring UI events (focus/structure/property)
stop_event_monitorStop monitoring a session or all sessions
get_event_logRead captured events from a monitoring session

Session & Window Management (2 tools)

ToolDescription
restore_windowRestore a minimized window and bring it to foreground
check_session_statusCheck if desktop is locked, app is minimized, and report available operations

📖 Full tool documentation with parameters, examples, and tips: See DOCUMENTATION.md


Quick Start

Example 1: Launch and Inspect an App

User: "Open Notepad and show me the UI tree"

AI calls:
  1. launch_app(exePath: "C:\\Windows\\notepad.exe")        → "app_1234"
  2. get_snapshot(appId: "app_1234", maxDepth: 3)           → UI tree

Example 2: Fill a Form and Save

User: "Create a new invoice in the app"

AI calls:
  1. attach_to_app(processName: "MyApp")                   → "app_5678"
  2. click_element(appId: "app_5678", name: "Invoices")     → navigates
  3. wait_for_element(appId: "app_5678", name: "New")       → page loaded
  4. click_element(appId: "app_5678", name: "New")          → opens form
  5. fill_form(appId: "app_5678", fieldsJson: {             → fills all fields
       "CustomerComboBox": "Acme Corp",
       "ItemName": "Widget",
       "Quantity": "10"
     })
  6. click_element(appId: "app_5678", name: "Save")         → saves
  7. take_screenshot(appId: "app_5678", outputPath: "...")   → evidence

Example 3: Test Status Transitions

User: "Test the invoice lifecycle"

AI calls:
  1. click_element(appId: "app_5678", name: "Mark as Sent")
  2. wait_for_condition(appId: "app_5678", property: "name",
       expectedValue: "Sent", automationId: "StatusBadge")
  3. element_exists(appId: "app_5678", name: "Record Payment")  → "true"
  4. click_element(appId: "app_5678", name: "Record Payment")
  5. take_screenshot_optimized(appId: "app_5678", outputPath: "...", maxTokens: 1000)

Performance

WinApp MCP is optimized for AI agent workflows where the same UI is inspected multiple times in quick succession:

OperationWithout CacheWith CacheImprovement
get_snapshot (complex app)400-800ms50-100ms8x faster
find_elements (50 results)300-600ms20-50ms12x faster
click_element (by name)200-500ms30-80ms6x faster
Window lookup100-300ms<1ms300x faster

The descendant cache is the biggest win — FindAllDescendants() is the most expensive UIA operation, and AI agents typically call it 3-5 times per UI state. The 2-second TTL ensures fresh data while avoiding redundant traversals.

Cache is automatically invalidated after any mutation (click, type, set value) so you always get accurate results.


Architecture

src/
├── Program.cs              # Entry point — .NET Generic Host + MCP server registration
├── WinAppTools.cs           # 55 MCP tool definitions (thin wrappers)
├── WinAppAutomation.cs      # Core automation engine (~2800 lines)
└── WinAppMCP.csproj         # .NET 10, FlaUI.UIA3 5.0.0, MCP 1.1.0

Design Principles:

  • Thin tool layerWinAppTools.cs contains only [McpServerTool] wrappers. Zero business logic.
  • Single automation engineWinAppAutomation.cs handles all UIA interactions, caching, and state management.
  • Static singleton — One WinAppAutomation instance shared across all tool calls.
  • Stdio transport — Clean JSON-RPC over stdin/stdout. Logging goes to stderr only.
  • No external dependencies beyond FlaUI — No Selenium, no WebDriver, no COM interop wrappers.

Technology Stack

ComponentTechnologyVersion
Runtime.NET10.0
UI AutomationFlaUI.UIA35.0.0
MCP ProtocolModelContextProtocol1.1.0
HostingMicrosoft.Extensions.Hosting10.0.3
Target OSWindows10 (1903+) / 11
Architecturex64win-x64

Comparison with Alternatives

FeatureWinApp MCPCursorTouch/Windows-MCPlocomorange/uiautomation-mcp
Tools55~1539
UIA LibraryFlaUI (managed)FlaUI (managed)Raw COM interop
Runtime.NET 10.NET 8.NET 9 (Native AOT)
ArchitectureSingle processSingle process6 projects, multi-process
Caching✅ Descendant + Window
Fuzzy Search✅ Levenshtein distance
HWND Targeting✅ Full (click, value, snapshot)✅ Partial
Event Monitoring✅ Focus, structure, property✅ Focus only
Grid Pattern
Virtualization
Screenshot Diff
Form Fill✅ Batch
Token-aware Screenshot
Drag & Drop
Minimized App Support✅ Auto-restore + PrintWindow
Locked Session Support✅ UIA pattern fallback
VS Code Extension✅ VSIX + Marketplace
npm Packagenpx winapp-mcp
Multi-Client✅ Copilot, Claude, Cursor, Windsurf❌ VS Code only

Documentation

  • DOCUMENTATION.md — Complete reference for all 55 tools with parameters, return values, examples, and tips
  • CHANGELOG.md — Version history and release notes

Requirements

  • OS: Windows 10 (version 1903 / build 18362) or later, or Windows 11
  • Runtime: .NET 10.0 SDK (for building from source) — not needed if using VSIX or pre-built binary
  • Editor: VS Code 1.99+ with GitHub Copilot (for VSIX extension)
  • Target Apps: Must expose a UI Automation tree (WinUI3, WPF, WinForms, UWP, most Win32 apps)

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-tool)
  3. Commit your changes (git commit -m 'Add amazing-tool')
  4. Push to the branch (git push origin feature/amazing-tool)
  5. Open a Pull Request

Development Setup

git clone https://github.com/floatingbrij/desktop-pilot-mcp.git
cd desktop-pilot-mcp/src
dotnet restore
dotnet build
dotnet run  # Starts the MCP server on stdio

License

This project is licensed under the MIT License — see the LICENSE file for details.


Built entirely with GitHub Copilot by Brijesharun G — for AI-powered Windows app testing.
If this helps your workflow, give it a ⭐

Reviews

No reviews yet

Sign in to write a review