MCP Hub
Back to servers

MCP Screen Text

A toolset for screen capture and OCR, allowing AI models to see and read text from the user's desktop or specific applications.

Stars
1
Tools
5
Updated
Jul 5, 2025
Validated
Mar 6, 2026

MCP Screen Text

A Model Context Protocol (MCP) server that provides screen capture and optical character recognition (OCR) capabilities.

🎥 Demo Video

MCP Screen Text Demo

See MCP Screen Text in action - capturing screens and extracting text with Claude Desktop

Features

  • Screen Capture: Take screenshots of specific displays or applications
  • Application-Specific Screenshots: Capture screenshots of specific application windows
  • OCR Text Extraction: Extract text from screenshots or existing images
  • Desktop Storage: All screenshots are saved to a "Screenshots" folder on your Desktop
  • Multi-format Support: Support for PNG and JPG image formats
  • Multi-language OCR: Support for multiple languages in text recognition
  • Application Discovery: List running applications available for capture

Tools Available

capture_screen

Captures a screenshot of the entire screen or a specific display.

Parameters:

  • display (number, optional): Display number to capture (0 for primary display)
  • format (string, optional): Image format for the screenshot ('png' or 'jpg')

capture_application_screen

Captures a screenshot of a specific application window.

Parameters:

  • applicationName (string, required): Name of the application to capture (e.g., 'Safari', 'Chrome', 'Finder')
  • format (string, optional): Image format ('png' or 'jpg')

list_applications

Lists all running applications that can be captured.

Parameters: None

extract_text

Extracts text from an existing image file using OCR.

Parameters:

  • imagePath (string, required): Path to the image file
  • language (string, optional): Language for OCR recognition (e.g., "eng", "spa", "fra")

capture_screen_and_extract_text

Captures a screenshot and extracts text from it in one operation. This is a convenience tool that combines screen capture and OCR and can work with both full screen and application-specific capture.

Parameters:

  • display (number, optional): Display number to capture (0 for primary display) - ignored if applicationName is provided
  • language (string, optional): Language for OCR recognition (e.g., "eng", "spa", "fra")
  • applicationName (string, optional): Name of the application to capture (e.g., 'Safari', 'Chrome'). If provided, captures only this application's window instead of full screen.

Installation

npm install

Development

# Build the project
npm run build

# Run in development mode
npm run dev

# Run the built version
npm start

Dependencies

  • @modelcontextprotocol/sdk: MCP SDK for server implementation
  • screenshot-desktop: Cross-platform screenshot capture
  • sharp: High-performance image processing
  • tesseract.js: OCR text extraction

Usage with MCP Client

This server can be used with any MCP-compatible client. Configure your client to connect to this server using stdio transport.

Example configuration for Claude Desktop:

{
  "mcpServers": {
    "screen-text": {
      "command": "node",
      "args": ["path/to/mcp-screen-text/dist/index.js"]
    }
  }
}

License

ISC

Reviews

No reviews yet

Sign in to write a review