MCP Hub
Back to servers

PAIML MCP Agent Toolkit

A professional project scaffolding and code analysis toolkit that provides technical debt grading, mutation testing, and AI-optimized context generation across 17+ languages.

Stars
116
Forks
19
Updated
Jan 9, 2026
Validated
Jan 9, 2026

PMAT

PMAT

Zero-configuration AI context generation for any codebase

Crates.io Documentation Tests Coverage License: MIT Rust DOI

Getting Started | Features | Examples | Documentation


What is PMAT?

PMAT (Pragmatic Multi-language Agent Toolkit) provides everything needed to analyze code quality and generate AI-ready context:

  • Context Generation - Deep analysis for Claude, GPT, and other LLMs
  • Technical Debt Grading - A+ through F scoring with 6 orthogonal metrics
  • Mutation Testing - Test suite quality validation (85%+ kill rate)
  • Repository Scoring - Quantitative health assessment (0-211 scale)
  • Semantic Search - Natural language code discovery
  • MCP Integration - 19 tools for Claude Code, Cline, and AI agents
  • Quality Gates - Pre-commit hooks, CI/CD integration
  • 17+ Languages - Rust, TypeScript, Python, Go, Java, C/C++, and more

Part of the PAIML Stack, following Toyota Way quality principles (Jidoka, Genchi Genbutsu, Kaizen).

Getting Started

Add to your system:

# Install from crates.io
cargo install pmat

# Or from source (latest)
git clone https://github.com/paiml/paiml-mcp-agent-toolkit
cd paiml-mcp-agent-toolkit && cargo install --path server

Basic Usage

# Generate AI-ready context
pmat context --output context.md --format llm-optimized

# Analyze code complexity
pmat analyze complexity

# Grade technical debt (A+ through F)
pmat analyze tdg

# Score repository health
pmat repo-score .

# Run mutation testing
pmat mutate --target src/

MCP Server Mode

# Start MCP server for Claude Code, Cline, etc.
pmat mcp

Features

Context Generation

Generate comprehensive context for AI assistants:

pmat context                           # Basic analysis
pmat context --format llm-optimized    # AI-optimized output
pmat context --include-tests           # Include test files

Technical Debt Grading (TDG)

Six orthogonal metrics for accurate quality assessment:

pmat analyze tdg                       # Project-wide grade
pmat analyze tdg --include-components  # Per-component breakdown
pmat tdg baseline create               # Create quality baseline
pmat tdg check-regression              # Detect quality degradation

Grading Scale:

  • A+/A: Excellent quality, minimal debt
  • B+/B: Good quality, manageable debt
  • C+/C: Needs improvement
  • D/F: Significant technical debt

Mutation Testing

Validate test suite effectiveness:

pmat mutate --target src/lib.rs        # Single file
pmat mutate --target src/ --threshold 85  # Quality gate
pmat mutate --failures-only            # CI optimization

Supported Languages: Rust, Python, TypeScript, JavaScript, Go, C++

Repository Health Scoring

Evidence-based quality metrics (0-211 scale):

pmat rust-project-score                # Fast mode (~3 min)
pmat rust-project-score --full         # Comprehensive (~10-15 min)
pmat repo-score . --deep               # Full git history

Workflow Prompts

Pre-configured AI prompts enforcing EXTREME TDD:

pmat prompt --list                     # Available prompts
pmat prompt code-coverage              # 85%+ coverage enforcement
pmat prompt debug                      # Five Whys analysis
pmat prompt quality-enforcement        # All quality gates

Git Hooks

Automatic quality enforcement:

pmat hooks install                     # Install pre-commit hooks
pmat hooks install --tdg-enforcement   # With TDG quality gates
pmat hooks status                      # Check hook status

Examples

Generate Context for AI

# For Claude Code
pmat context --output context.md --format llm-optimized

# With semantic search
pmat embed sync ./src
pmat semantic search "error handling patterns"

CI/CD Integration

# .github/workflows/quality.yml
name: Quality Gates
on: [push, pull_request]

jobs:
  quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: cargo install pmat
      - run: pmat analyze tdg --fail-on-violation --min-grade B
      - run: pmat mutate --target src/ --threshold 80

Quality Baseline Workflow

# 1. Create baseline
pmat tdg baseline create --output .pmat/baseline.json

# 2. Check for regressions
pmat tdg check-regression \
  --baseline .pmat/baseline.json \
  --max-score-drop 5.0 \
  --fail-on-regression

Architecture

pmat/
├── server/           CLI and MCP server
│   ├── src/
│   │   ├── cli/      Command handlers
│   │   ├── services/ Analysis engines
│   │   ├── mcp/      MCP protocol
│   │   └── tdg/      Technical Debt Grading
├── crates/
│   └── pmat-dashboard/  Pure WASM dashboard
└── docs/
    └── specifications/  Technical specs

Quality

MetricValue
Tests4600+ passing
Coverage>85%
Mutation Score>80%
Languages17+ supported
MCP Tools19 available

Falsifiable Quality Commitments

Per Popper's demarcation criterion, all claims are measurable and testable:

CommitmentThresholdVerification Method
Context Generation< 5 seconds for 10K LOC projecttime pmat context on test corpus
Memory Usage< 500 MB for 100K LOC analysisMeasured via heaptrack in CI
Test Coverage≥ 85% line coveragecargo llvm-cov (CI enforced)
Mutation Score≥ 80% killed mutantspmat mutate --threshold 80
Build Time< 3 minutes incrementalcargo build --timings
CI Pipeline< 15 minutes totalGitHub Actions workflow timing
Binary Size< 50 MB release binaryls -lh target/release/pmat
Language ParsersAll 17 languages parse without panicFuzz testing in CI

How to Verify:

# Run self-assessment with Popper Falsifiability Score
pmat popper-score --verbose

# Individual commitment verification
cargo llvm-cov --html        # Coverage ≥85%
pmat mutate --threshold 80   # Mutation ≥80%
cargo build --timings        # Build time <3min

Failure = Regression: Any commitment violation blocks CI merge.

Benchmark Results (Statistical Rigor)

All benchmarks use Criterion.rs with proper statistical methodology:

OperationMean95% CIStd DevSample Size
Context (1K LOC)127ms[124, 130]±12.3msn=1000 runs
Context (10K LOC)1.84s[1.79, 1.90]±156msn=500 runs
TDG Scoring156ms[148, 164]±18.2msn=500 runs
Complexity Analysis23ms[22, 24]±3.1msn=1000 runs

Comparison Baselines (vs. Alternatives):

MetricPMATctagstree-sitterEffect Size
10K LOC parsing1.84s0.3s0.8sd=0.72 (medium)
Memory (10K LOC)287MB45MB120MB-
Semantic depthFullSyntax onlyAST only-

See docs/BENCHMARKS.md for complete statistical analysis.

ML/AI Reproducibility

PMAT uses ML for semantic search and embeddings. All ML operations are reproducible:

Random Seed Management:

  • Embedding generation uses fixed seed (SEED=42) for deterministic outputs
  • Clustering operations use fixed seed (SEED=12345)
  • Seeds documented in docs/ml/REPRODUCIBILITY.md

Model Artifacts:

  • Pre-trained models from HuggingFace (all-MiniLM-L6-v2)
  • Model versions pinned in Cargo.toml
  • Hash verification on download

Dataset Sources

PMAT does not train models but uses these data sources for evaluation:

DatasetSourcePurposeSize
CodeSearchNetGitHub/MicrosoftSemantic search benchmarks2M functions
PMAT-benchInternalRegression testing500 queries

Data provenance and licensing documented in docs/ml/REPRODUCIBILITY.md.

PAIML Stack

LibraryPurposeVersion
truenoSIMD tensor operations0.7.3
entrenarTraining & optimization0.2.3
aprenderML algorithms0.14.0
realizarGGUF inference0.2.1
pmatCode analysis toolkit2.213.1

Documentation

License

MIT License - see LICENSE for details.


Built with Extreme TDD | Part of PAIML

Reviews

No reviews yet

Sign in to write a review