Debug Companion MCP (Pytest Debugging Agent)

Local MCP server that helps an AI coding agent debug Python projects by turning pytest failures into:

Exact failure locations (file:line)
A focused code context window around the failure
Optional LLM suggestion (Gemini) for a fix

Why this exists

When an agent gets a long pytest output, it often wastes time hunting for the real failure spot. This server extracts the actionable bits and returns them in a structured, tool-friendly way.

Requirements

Python 3.12+
uv

Install

uv sync

Run the MCP server

uv run python server.py

CI

GitHub Actions runs server tests on each push/PR:

workflow: .github/workflows/tests.yml

Tools

ping — health check
run_pytest(target, max_output_lines=200, timeout_seconds=30) — run pytest safely (bounded output + timeout)
extract_failures(pytest_output, limit=5, base_dir=".") — parse file.py:line locations from pytest output
open_context(path, line, radius=12, base_dir=".") — return a code window around a line
debug_project(target, ...) — orchestrates: run_pytest → extract_failures → open_context → (optional) Gemini analysis

Safety

pytest runs with a timeout + output cap, and file access is restricted to the server root unless explicitly allowlisted via MCP_ALLOWED_ROOTS.

Quick demo

Run on the intentionally failing demo project:

debug_project(target="demo_project")

Expected output (example):

Failures:
- demo_project/test_calc.py:11 (test_divide_by_zero) — Failed: DID NOT RAISE ZeroDivisionError

Context (±12):
  10 def test_divide_by_zero():
  11     with pytest.raises(ZeroDivisionError):
  12         _ = divide(10, 0)

Demo project

demo_project/ — minimal demo with an intentional failing test (fast to understand)

Environment variables

GEMINI_API_KEY — enable Gemini analysis (optional)
MCP_ALLOWED_ROOTS — allow access to absolute paths outside the server root (optional)

Future work (ideas)

Test scaffolding (opt-in): detect projects with no tests and optionally generate a minimal smoke test skeleton (e.g., tests/test_smoke.py) to validate imports / basic execution before running deeper debugging flows.
Richer failure parsing: better extraction for parameterized tests and multi-traceback outputs.
Autofix loop: apply a patch (manual approval) → re-run pytest → summarize diff + results.

Note: Automatically generating meaningful tests is highly project-specific. The goal would be lightweight scaffolding (opt-in), not replacing real, requirement-driven tests.

MCP-SERVER