Debug Companion MCP (Pytest Debugging Agent)
Local MCP server that helps an AI coding agent debug Python projects by turning pytest failures into:
- Exact failure locations (
file:line) - A focused code context window around the failure
- Optional LLM suggestion (Gemini) for a fix
Why this exists
When an agent gets a long pytest output, it often wastes time hunting for the real failure spot. This server extracts the actionable bits and returns them in a structured, tool-friendly way.
Requirements
- Python 3.12+
- uv
Install
uv sync
Run the MCP server
uv run python server.py
CI
GitHub Actions runs server tests on each push/PR:
- workflow:
.github/workflows/tests.yml
Tools
ping— health checkrun_pytest(target, max_output_lines=200, timeout_seconds=30)— run pytest safely (bounded output + timeout)extract_failures(pytest_output, limit=5, base_dir=".")— parsefile.py:linelocations from pytest outputopen_context(path, line, radius=12, base_dir=".")— return a code window around a linedebug_project(target, ...)— orchestrates:run_pytest → extract_failures → open_context → (optional) Gemini analysis
Safety
pytest runs with a timeout + output cap, and file access is restricted to the server root unless explicitly allowlisted via MCP_ALLOWED_ROOTS.
Quick demo
Run on the intentionally failing demo project:
debug_project(target="demo_project")
Expected output (example):
Failures:
- demo_project/test_calc.py:11 (test_divide_by_zero) — Failed: DID NOT RAISE ZeroDivisionError
Context (±12):
10 def test_divide_by_zero():
11 with pytest.raises(ZeroDivisionError):
12 _ = divide(10, 0)
Demo project
demo_project/— minimal demo with an intentional failing test (fast to understand)
Environment variables
GEMINI_API_KEY— enable Gemini analysis (optional)MCP_ALLOWED_ROOTS— allow access to absolute paths outside the server root (optional)
Future work (ideas)
- Test scaffolding (opt-in): detect projects with no tests and optionally generate a minimal smoke test skeleton (e.g.,
tests/test_smoke.py) to validate imports / basic execution before running deeper debugging flows. - Richer failure parsing: better extraction for parameterized tests and multi-traceback outputs.
- Autofix loop: apply a patch (manual approval) → re-run pytest → summarize diff + results.
Note: Automatically generating meaningful tests is highly project-specific. The goal would be lightweight scaffolding (opt-in), not replacing real, requirement-driven tests.