prometheus-mcp

A Model Context Protocol (MCP) server for Prometheus integration. Give your AI assistant eyes on your metrics and alerts.

Status: Planning Author: Claude (claude@arktechnwa.com) + Meldrey License: MIT Organization: ArktechNWA

Why?

Your AI assistant can analyze code, but it can't see if your services are healthy. It can suggest optimizations, but can't see the actual latency metrics. It's blind to the alerts firing at 3am.

prometheus-mcp connects Claude to your Prometheus server — read-only, safe, insightful.

Philosophy

Read-only by design — Prometheus queries don't mutate state
Query safety — Timeout expensive queries, limit cardinality
Never hang — PromQL can be expensive, always timeout
Structured output — Metrics + human summaries
Fallback AI — Haiku for anomaly detection and query help

Features

Perception (Read)

Instant queries (current values)
Range queries (over time)
Alert status and history
Target health
Recording rules and alerts
Label discovery
Metric metadata

Analysis (AI-Assisted)

"Is this metric normal?"
"What caused this spike?"
"Suggest a query for X"
Anomaly detection

Permission Model

Prometheus is inherently read-only for queries. Permissions focus on:

Level	Description	Default
`query`	Run PromQL queries	ON
`alerts`	View alert status	ON
`admin`	View config, reload rules	OFF

Query Safety

{
  "query_limits": {
    "max_duration": "30s",
    "max_resolution": "10000",
    "max_series": 1000,
    "blocked_metrics": [
      "__.*",
      "secret_.*"
    ]
  }
}

Safety features:

Query timeout enforcement
Cardinality limits
Metric blacklist patterns
Rate limiting

Authentication

{
  "prometheus": {
    "url": "http://localhost:9090",
    "auth": {
      "type": "none" | "basic" | "bearer",
      "username_env": "PROM_USER",
      "password_env": "PROM_PASS",
      "token_env": "PROM_TOKEN"
    }
  }
}

Tools

Queries

`prom_query`

Execute instant query (current values).

prom_query({
  query: string,            // PromQL expression
  time?: string             // evaluation time (default: now)
})

Returns:

{
  "query": "up{job=\"api\"}",
  "result_type": "vector",
  "results": [
    {
      "metric": {"job": "api", "instance": "api-1:8080"},
      "value": 1,
      "timestamp": "2025-12-29T10:30:00Z"
    }
  ],
  "summary": "3 of 3 api instances are up"
}

`prom_query_range`

Execute range query (over time).

prom_query_range({
  query: string,
  start: string,            // ISO timestamp or relative: "-1h"
  end?: string,             // default: now
  step?: string             // resolution: "15s", "1m", "5m"
})

Returns:

{
  "query": "rate(http_requests_total[5m])",
  "result_type": "matrix",
  "results": [
    {
      "metric": {"handler": "/api/users"},
      "values": [[1735470600, "123.45"], ...],
      "stats": {
        "min": 100.2,
        "max": 456.7,
        "avg": 234.5,
        "current": 345.6
      }
    }
  ],
  "summary": "Request rate ranged from 100-457 req/s over the last hour, currently 346 req/s"
}

`prom_series`

Find series matching label selectors.

prom_series({
  match: string[],          // label matchers
  start?: string,
  end?: string,
  limit?: number
})

`prom_labels`

Get label names or values.

prom_labels({
  label?: string,           // get values for this label (omit for label names)
  match?: string[],         // filter by series
  limit?: number
})

Alerts

`prom_alerts`

Get current alert status.

prom_alerts({
  state?: "firing" | "pending" | "inactive",
  filter?: string           // alert name pattern
})

Returns:

{
  "alerts": [
    {
      "name": "HighErrorRate",
      "state": "firing",
      "severity": "critical",
      "summary": "Error rate > 5% for api service",
      "started_at": "2025-12-29T10:15:00Z",
      "duration": "15m",
      "labels": {"job": "api", "severity": "critical"},
      "annotations": {"summary": "..."}
    }
  ],
  "summary": "1 critical, 0 warning alerts firing"
}

`prom_rules`

Get alerting and recording rules.

prom_rules({
  type?: "alert" | "record",
  filter?: string
})

Targets

`prom_targets`

Get scrape target health.

prom_targets({
  state?: "active" | "dropped",
  job?: string
})

Returns:

{
  "targets": [
    {
      "job": "api",
      "instance": "api-1:8080",
      "health": "up",
      "last_scrape": "2025-12-29T10:29:45Z",
      "scrape_duration": "0.023s",
      "error": null
    }
  ],
  "summary": "12 of 12 targets healthy"
}

Discovery

`prom_metadata`

Get metric metadata (help, type, unit).

prom_metadata({
  metric?: string,          // specific metric (omit for all)
  limit?: number
})

Analysis

`prom_analyze`

AI-powered metric analysis.

prom_analyze({
  query: string,
  question?: string,        // "Is this normal?", "What caused the spike?"
  use_ai?: boolean
})

Returns:

{
  "query": "rate(http_errors_total[5m])",
  "data_summary": {
    "current": 12.3,
    "1h_ago": 2.1,
    "change": "+486%"
  },
  "synthesis": {
    "analysis": "Error rate spiked 5x in the last hour. The spike correlates with deployment at 10:15. Errors are concentrated on /api/checkout endpoint.",
    "suggested_queries": [
      "rate(http_errors_total{handler=\"/api/checkout\"}[5m])",
      "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))"
    ],
    "confidence": "high"
  }
}

`prom_suggest_query`

Get PromQL query suggestions.

prom_suggest_query({
  intent: string            // "show me api latency p99"
})

NEVERHANG Architecture

PromQL queries can be expensive. High-cardinality queries can OOM Prometheus.

Query Timeouts

Default: 30s
Configurable per-query
Server-side timeout parameter

Cardinality Protection

Limit series returned
Block known expensive patterns
Warn on high-cardinality queries

Circuit Breaker

3 timeouts in 60s → 5 minute cooldown
Tracks Prometheus health
Graceful degradation

{
  "neverhang": {
    "query_timeout": 30000,
    "max_series": 1000,
    "circuit_breaker": {
      "failures": 3,
      "window": 60000,
      "cooldown": 300000
    }
  }
}

Fallback AI

Optional Haiku for metric analysis.

{
  "fallback": {
    "enabled": true,
    "model": "claude-haiku-4-5",
    "api_key_env": "PROM_MCP_FALLBACK_KEY",
    "max_tokens": 500
  }
}

When used:

prom_analyze with questions
prom_suggest_query for natural language
Anomaly detection

Configuration

~/.config/prometheus-mcp/config.json:

{
  "prometheus": {
    "url": "http://localhost:9090",
    "auth": {
      "type": "none"
    }
  },
  "permissions": {
    "query": true,
    "alerts": true,
    "admin": false
  },
  "query_limits": {
    "max_duration": "30s",
    "max_series": 1000
  },
  "fallback": {
    "enabled": false
  }
}

Claude Code Integration

{
  "mcpServers": {
    "prometheus": {
      "command": "prometheus-mcp",
      "args": ["--config", "/path/to/config.json"]
    }
  }
}

Installation

npm install -g @arktechnwa/prometheus-mcp

Requirements

Node.js 18+
Prometheus server (2.x+)
Optional: Anthropic API key for fallback AI

Credits

Created by Claude (claude@arktechnwa.com) in collaboration with Meldrey. Part of the ArktechNWA MCP Toolshed.

Prometheus MCP Server

prometheus-mcp

Why?

Philosophy

Features

Perception (Read)

Analysis (AI-Assisted)

Permission Model

Query Safety

Authentication

Tools

Queries

`prom_query`

`prom_query_range`

`prom_series`

`prom_labels`

Alerts

`prom_alerts`

`prom_rules`

Targets

`prom_targets`

Discovery

`prom_metadata`

Analysis

`prom_analyze`

`prom_suggest_query`

NEVERHANG Architecture

Query Timeouts

Cardinality Protection

Circuit Breaker

Fallback AI

Configuration

Claude Code Integration

Installation

Requirements

Credits

Reviews