RPA MCP Server
Robotic Process Automation (RPA) REST API service for desktop automation, vision, and workflow orchestration.
Overview
Spring Boot service providing REST API for desktop automation including screenshot capture, AI vision, mouse/keyboard control, file operations, OCR, and multi-step workflow execution.
Technology Stack
- Framework: Spring Boot
- Language: Java
- Build: Gradle
- AI Vision: Ollama integration
- Port: 9100
Features
Screenshot & Vision
- Capture screenshots
- AI-powered image description
- OCR text extraction
Desktop Automation
- Mouse clicks (coordinates)
- Keyboard input and key presses
- Window management (list, focus, close)
File Operations
- Read/write files
- List directories
- Delete files
Browser Control
- Open URLs in default browser
- Launch specific browsers (Chrome, Firefox)
Workflow Orchestration
- Multi-step automation sequences
- Conditional execution
- Error handling
API Endpoints
Base URL: http://royaloak02.local:9100
Status
GET /rpa/status- Service status
Screenshot & Vision
GET /screenshot- Capture screenshot (PNG)GET /vision/describe- AI description of last screenshot
Automation
POST /auto/click?x=100&y=200- Click coordinatesPOST /auto/type- Type text (body: text)POST /auto/key- Press keys (body: key name)GET /auto/windows- List open windowsPOST /auto/focus?title=<name>- Focus windowPOST /auto/close?title=<name>- Close window
File Operations
GET /file/read?path=<path>- Read filePOST /file/write?path=<path>- Write file (body: content)GET /file/list?path=<path>- List directoryPOST /file/delete?path=<path>- Delete file
Browser
POST /browser/open?url=<url>- Open in default browserPOST /browser/chrome?url=<url>- Open in ChromePOST /browser/firefox?url=<url>- Open in Firefox
OCR
GET /ocr/screen- Extract text from screenshotGET /ocr/file?path=<path>- Extract text from image
Workflow
POST /workflow/execute- Execute action sequence
Example workflow:
[
{"action": "open", "url": "https://example.com"},
{"action": "wait", "ms": 2000},
{"action": "screenshot"},
{"action": "click", "x": 100, "y": 200}
]
Building
./gradlew build
Running
./gradlew bootRun
Or use control script:
./control.sh start
./control.sh stop
./control.sh status
./control.sh log
Documentation
- API.md - Complete API reference
- IMPROVEMENTS.md - Planned improvements
- SUGGESTIONS.md - Enhancement suggestions
Use Cases
- Desktop automation
- UI testing
- Screen scraping
- Workflow automation
- AI-powered vision tasks
- File management automation
- Browser automation