MCP Hub
Back to servers

Cluster Execution MCP Server

Enables distributed task execution across a cluster of nodes with automatic routing based on load, OS, and architecture requirements. It provides secure SSH-based remote command execution, parallel processing capabilities, and real-time cluster health monitoring.

Tools
4
Updated
Dec 31, 2025

Cluster Execution MCP Server

Cluster-aware command execution for distributed task routing across the AGI agentic cluster.

Version: 0.2.0

Features

  • Automatic task routing: Commands routed to optimal nodes based on load, capabilities, and requirements
  • Multi-node support: macpro51 (Linux x86_64), mac-studio (macOS ARM64), macbook-air (macOS ARM64), inference node
  • Dynamic IP resolution: mDNS, DNS, and fallback methods with caching
  • Security hardened: No shell injection, environment-based configuration, command validation
  • SSH connectivity verification: Retry logic with configurable timeouts
  • Parallel execution: Distribute commands across cluster for maximum throughput

Installation

cd /mnt/agentic-system/mcp-servers/cluster-execution-mcp
pip install -e .

# For development:
pip install -e ".[dev]"

Configuration

Claude Code Configuration

Add to ~/.claude.json:

{
  "mcpServers": {
    "cluster-execution": {
      "command": "/mnt/agentic-system/.venv/bin/python3",
      "args": ["-m", "cluster_execution_mcp.server"]
    }
  }
}

Environment Variables

All configuration is externalized via environment variables:

VariableDefaultDescription
CLUSTER_SSH_USERmarcSSH username for remote execution
CLUSTER_SSH_TIMEOUT5SSH connection timeout (seconds)
CLUSTER_SSH_CONNECT_TIMEOUT2Initial SSH connect timeout (seconds)
CLUSTER_SSH_RETRIES2Number of SSH retry attempts
CLUSTER_CPU_THRESHOLD40CPU usage % threshold for offloading
CLUSTER_LOAD_THRESHOLD4Load average threshold for offloading
CLUSTER_MEMORY_THRESHOLD80Memory usage % threshold for offloading
CLUSTER_CMD_TIMEOUT300Command execution timeout (seconds)
CLUSTER_STATUS_TIMEOUT5Status check timeout (seconds)
CLUSTER_IP_CACHE_TTL300IP resolution cache TTL (seconds)
CLUSTER_GATEWAY192.168.1.1Gateway IP for route detection
CLUSTER_DNS8.8.8.8DNS server for IP detection
AGENTIC_SYSTEM_PATH/mnt/agentic-systemBase path for databases

Node Configuration

Node hostnames and IPs can be customized:

VariableDefaultDescription
CLUSTER_MACPRO51_HOSTmacpro51.localMac Pro hostname
CLUSTER_MACPRO51_IP192.168.1.183Mac Pro fallback IP
CLUSTER_MACSTUDIO_HOSTMarcs-Mac-Studio.localMac Studio hostname
CLUSTER_MACSTUDIO_IP192.168.1.16Mac Studio fallback IP
CLUSTER_MACBOOKAIR_HOSTMarcs-MacBook-Air.localMacBook Air hostname
CLUSTER_MACBOOKAIR_IP192.168.1.172MacBook Air fallback IP
CLUSTER_INFERENCE_HOSTcompleteu-server.localInference node hostname
CLUSTER_INFERENCE_IP192.168.1.186Inference node fallback IP

MCP Tools

ToolDescription
cluster_bashExecute bash commands with automatic cluster routing
cluster_statusGet current cluster state and load distribution
offload_toExplicitly route command to specific node
parallel_executeRun multiple commands in parallel across nodes

Usage Examples

Automatic Routing

# Heavy commands auto-route to least loaded node
result = await cluster_bash("make -j8 all")

# Simple commands run locally
result = await cluster_bash("ls -la")

Force Specific Requirements

# Force Linux execution
result = await cluster_bash("docker build .", requires_os="linux")

# Force x86_64 architecture
result = await cluster_bash("cargo build", requires_arch="x86_64")

Explicit Node Routing

# Run on Linux builder
result = await offload_to("podman run -it ubuntu:22.04", node_id="macpro51")

# Run on Mac Studio
result = await offload_to("swift build", node_id="mac-studio")

Parallel Execution

# Run tests across cluster
results = await parallel_execute([
    "pytest tests/unit/",
    "pytest tests/integration/",
    "pytest tests/e2e/"
])

Cluster Status

# Get cluster health before heavy operations
status = await cluster_status()
# Returns:
# {
#   "local_node": "macpro51",
#   "nodes": {
#     "macpro51": {"cpu_percent": 15.2, "memory_percent": 45.3, ...},
#     "mac-studio": {"cpu_percent": 8.1, "memory_percent": 32.1, ...},
#     ...
#   }
# }

Cluster Nodes

NodeOSArchCapabilitiesSpecialties
macpro51Linuxx86_64docker, podman, raid, nvme, compilation, testing, tpucompilation, testing, containerization, benchmarking
mac-studiomacOSARM64orchestration, coordination, temporal, mlx-gpu, arduinoorchestration, coordination, monitoring
macbook-airmacOSARM64research, documentation, analysisresearch, documentation, mobile
inferencemacOSARM64ollama, inference, model-serving, llm-apiollama-inference, model-serving

Offload Patterns

Commands matching these patterns are automatically offloaded:

  • Build: make, cargo, npm, yarn, pnpm
  • Test: pytest, jest, mocha, test
  • Compile: gcc, g++, clang
  • Container: docker, podman, kubectl
  • File ops: rsync, scp, tar, zip, find, grep -r

Commands that stay local:

  • Simple: ls, pwd, cd, echo, cat, head, tail, which, type

Security

Shell Injection Prevention

All commands use subprocess.run() with list arguments where possible:

# SAFE: List arguments
subprocess.run(["ssh", "-o", "ConnectTimeout=5", f"{user}@{ip}", command])

# Complex shell commands are validated before execution

Command Validation

Commands are validated for dangerous patterns:

  • rm -rf /
  • rm -rf /*
  • > /dev/sda
  • Fork bombs
  • And more...

SSH Configuration

  • StrictHostKeyChecking=accept-new - Accept new hosts but verify returning hosts
  • BatchMode=yes - Non-interactive mode for scripting
  • Configurable timeouts and retries

Development

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=cluster_execution_mcp --cov-report=html

Project Structure

cluster-execution-mcp/
├── src/cluster_execution_mcp/
│   ├── __init__.py      # Package exports
│   ├── config.py        # Configuration, validation, node definitions
│   ├── router.py        # Task routing and IP resolution
│   └── server.py        # FastMCP server and tools
├── tests/
│   ├── conftest.py      # Pytest fixtures
│   ├── test_config.py   # Config module tests (29 tests)
│   ├── test_router.py   # Router module tests (21 tests)
│   └── test_server.py   # Server and tool tests (21 tests)
└── pyproject.toml       # Package configuration

CLI Interface

# Submit a command
cluster-router submit "make -j8 all"

# Check task status
cluster-router status <task_id>

# Show cluster status
cluster-router cluster-status

Monitoring

Check cluster health before operations:

User: "Show me cluster status"

Claude Code: cluster_status tool

Output:
  macpro51:
    CPU: 45.2%
    Memory: 18.3%
    Load: 3.21
    Status: healthy

  mac-studio:
    CPU: 22.1%
    Memory: 54.7%
    Load: 2.15
    Status: healthy

  macbook-air:
    CPU: 12.8%
    Memory: 38.2%
    Load: 1.03
    Status: healthy

Troubleshooting

MCP server not loading:

# Check config
cat ~/.claude.json | jq '.mcpServers["cluster-execution"]'

# Test server import
python3 -c "from cluster_execution_mcp.server import main; print('OK')"

Node unreachable:

# Test SSH connectivity
ssh marc@macpro51.local hostname
ssh marc@Marcs-Mac-Studio.local hostname

# Check with fallback IP
ssh marc@192.168.1.183 hostname

Commands timing out:

# Increase timeout via environment
export CLUSTER_CMD_TIMEOUT=600  # 10 minutes
export CLUSTER_SSH_TIMEOUT=10   # 10 seconds

Changelog

v0.2.0

  • New Features:

    • Proper package structure with pyproject.toml
    • Environment-based configuration (no hardcoded credentials)
    • Shared config module with validation functions
    • Retry logic for SSH connectivity
    • IP resolution caching with TTL
    • Inference node support
  • Security Improvements:

    • Eliminated shell injection vulnerabilities
    • Command validation for dangerous patterns
    • IP validation rejecting loopback/Docker/link-local
    • SSH host key handling (accept-new)
  • Code Quality:

    • Full type hints throughout codebase
    • Replaced bare except clauses with specific exceptions
    • Added comprehensive logging
    • 71 unit tests with mocking
  • Bug Fixes:

    • Fixed darwin/macos OS alias handling
    • Proper timeout handling in SSH operations
    • Better error messages for failed operations

v0.1.0

  • Initial release with basic cluster execution

License

MIT


Part of the AGI Agentic System

See also:

  • Node Chat MCP - Inter-node communication
  • Enhanced Memory MCP - Persistent memory with RAG
  • Agent Runtime MCP - Goals and task queue

Reviews

No reviews yet

Sign in to write a review