Judges Panel

An MCP (Model Context Protocol) server that provides a panel of 18 specialized judges to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed.

Quick Start

1. Install and Build

git clone https://github.com/KevinRabun/judges.git
cd judges
npm install
npm run build

2. Try the Demo

Run the included demo to see all 18 judges evaluate a purposely flawed API server:

npm run demo

This evaluates examples/sample-vulnerable-api.ts — a file intentionally packed with security holes, performance anti-patterns, and code quality issues — and prints a full verdict with per-judge scores and findings.

What you'll see:

╔══════════════════════════════════════════════════════════════╗
║           Judges Panel — Full Tribunal Demo                 ║
╚══════════════════════════════════════════════════════════════╝

  Overall Verdict : FAIL
  Overall Score   : 43/100
  Critical Issues : 15
  High Issues     : 17
  Total Findings  : 83
  Judges Run      : 18

  Per-Judge Breakdown:
  ────────────────────────────────────────────────────────────────
  ❌ Judge Data Security              0/100    7 finding(s)
  ❌ Judge Cybersecurity              0/100    7 finding(s)
  ❌ Judge Cost Effectiveness        52/100    5 finding(s)
  ⚠️  Judge Scalability              65/100    4 finding(s)
  ❌ Judge Cloud Readiness           61/100    4 finding(s)
  ❌ Judge Software Practices        45/100    6 finding(s)
  ❌ Judge Accessibility              0/100    8 finding(s)
  ❌ Judge API Design                 0/100    9 finding(s)
  ❌ Judge Reliability               54/100    3 finding(s)
  ❌ Judge Observability             45/100    5 finding(s)
  ❌ Judge Performance               27/100    5 finding(s)
  ❌ Judge Compliance                 0/100    4 finding(s)
  ⚠️  Judge Testing                  90/100    1 finding(s)
  ⚠️  Judge Documentation            70/100    4 finding(s)
  ⚠️  Judge Internationalization     65/100    4 finding(s)
  ⚠️  Judge Dependency Health        90/100    1 finding(s)
  ❌ Judge Concurrency               44/100    4 finding(s)
  ❌ Judge Ethics & Bias             65/100    2 finding(s)

3. Run the Tests

npm test

Runs 184 automated tests covering all 18 judges, markdown formatters, and edge cases.

4. Connect to Your Editor

Add the Judges Panel as an MCP server so your AI coding assistant can use it automatically.

VS Code — create .vscode/mcp.json in your project:

{
  "servers": {
    "judges": {
      "command": "node",
      "args": ["/absolute/path/to/judges/dist/index.js"]
    }
  }
}

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "judges": {
      "command": "node",
      "args": ["/absolute/path/to/judges/dist/index.js"]
    }
  }
}

Or install from npm instead of cloning:

npm install -g @kevinrabun/judges

Then use judges as the command in your MCP config (no args needed).

The Judge Panel

Judge	Domain	Rule Prefix	What It Evaluates
Data Security	Data Security & Privacy	`DATA-`	Encryption, PII handling, secrets management, access controls
Cybersecurity	Cybersecurity & Threat Defense	`CYBER-`	Injection attacks, XSS, CSRF, auth flaws, OWASP Top 10
Cost Effectiveness	Cost Optimization	`COST-`	Algorithm efficiency, N+1 queries, memory waste, caching strategy
Scalability	Scalability & Performance	`SCALE-`	Statelessness, horizontal scaling, concurrency, bottlenecks
Cloud Readiness	Cloud-Native & DevOps	`CLOUD-`	12-Factor compliance, containerization, graceful shutdown, IaC
Software Practices	Engineering Best Practices	`SWDEV-`	SOLID principles, type safety, error handling, input validation
Accessibility	Accessibility (a11y)	`A11Y-`	WCAG compliance, screen reader support, keyboard navigation, ARIA
API Design	API Design & Contracts	`API-`	REST conventions, versioning, pagination, error responses
Reliability	Reliability & Resilience	`REL-`	Error handling, timeouts, retries, circuit breakers
Observability	Observability & Monitoring	`OBS-`	Structured logging, health checks, metrics, tracing
Performance	Performance & Efficiency	`PERF-`	N+1 queries, sync I/O, caching, memory leaks
Compliance	Regulatory Compliance	`COMP-`	GDPR/CCPA, PII protection, consent, data retention, audit trails
Testing	Testing & Quality Assurance	`TEST-`	Test coverage, assertions, test isolation, naming
Documentation	Documentation & Readability	`DOC-`	JSDoc/docstrings, magic numbers, TODOs, code comments
Internationalization	Internationalization (i18n)	`I18N-`	Hardcoded strings, locale handling, currency formatting
Dependency Health	Dependency Management	`DEPS-`	Version pinning, deprecated packages, supply chain
Concurrency	Concurrency & Async Safety	`CONC-`	Race conditions, unbounded parallelism, missing await
Ethics & Bias	Ethics & Bias	`ETHICS-`	Demographic logic, dark patterns, inclusive language

How It Works

The tribunal operates in two modes:

Pattern-Based Analysis (Tools) — The evaluate_code and evaluate_code_single_judge tools perform heuristic analysis using pattern matching to catch common anti-patterns. This works entirely offline with zero external API calls.
LLM-Powered Deep Analysis (Prompts) — The server exposes MCP prompts (e.g., judge-data-security, full-tribunal) that provide each judge's expert persona as a system prompt. When used by an LLM-based client, this enables deeper, context-aware analysis beyond what pattern matching can detect.

Composable by Design

Judges Panel is intentionally focused on heuristic pattern detection — fast, offline, zero-dependency. It does not try to be an AST parser, a CVE scanner, or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.

Recommended MCP Stack

When your AI coding assistant connects to multiple MCP servers, each one contributes its specialty:

┌─────────────────────────────────────────────────────────┐
│                   AI Coding Assistant                   │
│              (Claude, Copilot, Cursor, etc.)            │
└──────┬──────────┬──────────┬──────────┬────────────────┘
       │          │          │          │
       ▼          ▼          ▼          ▼
  ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────┐
  │ Judges  │ │  AST   │ │  CVE / │ │ Linter │
  │  Panel  │ │ Server │ │  SBOM  │ │ Server │
  └─────────┘ └────────┘ └────────┘ └────────┘
   Heuristic   Structural  Vuln DB    Style &
   patterns    analysis    scanning   correctness

Layer	What It Does	Example Servers
Judges Panel	18-judge quality gate — security patterns, cost, scalability, a11y, compliance, ethics	This server
AST Analysis	Deep structural analysis — data flow, complexity metrics, dead code, type tracking	Tree-sitter, Semgrep, SonarQube MCP servers
CVE / SBOM	Vulnerability scanning against live databases — known CVEs, license risks, supply chain	OSV, Snyk, Trivy, Grype MCP servers
Linting	Language-specific style and correctness rules	ESLint, Ruff, Clippy MCP servers
Runtime Profiling	Memory, CPU, latency measurement on running code	Custom profiling MCP servers

Why Orchestration Beats a Monolith

	Monolith	Orchestrated MCP Stack
Maintenance	One team owns everything	Each server evolves independently
Depth	Shallow coverage of many domains	Deep expertise per server
Updates	CVE data stale = full redeploy	CVE server updates on its own
Language support	Must embed parsers for every language	AST server handles this
User choice	All or nothing	Pick the servers you need
Offline capability	Hard to achieve with CVE deps	Judges runs fully offline; CVE server handles network

What This Means in Practice

When you ask your AI assistant "Is this code production-ready?", the agent can:

Judges Panel → Scan for hardcoded secrets, missing error handling, N+1 queries, accessibility gaps, compliance issues
AST Server → Analyze cyclomatic complexity, detect unreachable code, trace tainted data flows
CVE Server → Check every dependency in package.json against known vulnerabilities
Linter Server → Enforce team style rules, catch language-specific gotchas

Each server returns structured findings. The AI synthesizes everything into a single, actionable review — no single server needs to do it all.

MCP Tools

`get_judges`

List all available judges with their domains and descriptions.

`evaluate_code`

Submit code to the full judges panel. All 18 judges evaluate independently and return a combined verdict.

Parameter	Type	Required	Description
`code`	string	yes	The source code to evaluate
`language`	string	yes	Programming language (e.g., `typescript`, `python`)
`context`	string	no	Additional context about the code

`evaluate_code_single_judge`

Submit code to a specific judge for targeted review.

Parameter	Type	Required	Description
`code`	string	yes	The source code to evaluate
`language`	string	yes	Programming language
`judgeId`	string	yes	See judge IDs below
`context`	string	no	Additional context

Judge IDs

data-security · cybersecurity · cost-effectiveness · scalability · cloud-readiness · software-practices · accessibility · api-design · reliability · observability · performance · compliance · testing · documentation · internationalization · dependency-health · concurrency · ethics-bias

MCP Prompts

Each judge has a corresponding prompt for LLM-powered deep analysis:

Prompt	Description
`judge-data-security`	Deep data security review
`judge-cybersecurity`	Deep cybersecurity review
`judge-cost-effectiveness`	Deep cost optimization review
`judge-scalability`	Deep scalability review
`judge-cloud-readiness`	Deep cloud readiness review
`judge-software-practices`	Deep software practices review
`judge-accessibility`	Deep accessibility/WCAG review
`judge-api-design`	Deep API design review
`judge-reliability`	Deep reliability & resilience review
`judge-observability`	Deep observability & monitoring review
`judge-performance`	Deep performance optimization review
`judge-compliance`	Deep regulatory compliance review
`judge-testing`	Deep testing quality review
`judge-documentation`	Deep documentation quality review
`judge-internationalization`	Deep i18n review
`judge-dependency-health`	Deep dependency health review
`judge-concurrency`	Deep concurrency & async safety review
`judge-ethics-bias`	Deep ethics & bias review
`full-tribunal`	All 18 judges in a single prompt

Scoring

Each judge scores the code from 0 to 100:

Severity	Score Deduction
Critical	−30 points
High	−18 points
Medium	−10 points
Low	−5 points
Info	−2 points

Verdict logic:

FAIL — Any critical finding, or score < 60
WARNING — Any high finding, any medium finding, or score < 80
PASS — Score ≥ 80 with no critical, high, or medium findings

The overall tribunal score is the average of all 18 judges. The overall verdict fails if any judge fails.

Project Structure

judges/
├── src/
│   ├── index.ts              # MCP server entry point — tools, prompts, transport
│   ├── types.ts              # TypeScript interfaces (Finding, JudgeEvaluation, etc.)
│   ├── evaluators/           # Pattern-based analysis engine for each judge
│   │   ├── index.ts          # evaluateWithJudge(), evaluateWithTribunal()
│   │   ├── shared.ts         # Scoring, verdict logic, markdown formatters
│   │   └── *.ts              # One analyzer per judge (18 files)
│   └── judges/               # Judge definitions (id, name, domain, system prompt)
│       ├── index.ts          # JUDGES array, getJudge(), getJudgeSummaries()
│       └── *.ts              # One definition per judge (18 files)
├── examples/
│   ├── sample-vulnerable-api.ts  # Intentionally flawed code (triggers all 18 judges)
│   └── demo.ts                   # Run: npm run demo
├── tests/
│   └── judges.test.ts            # Run: npm test (184 tests)
├── server.json               # MCP Registry manifest
├── package.json
├── tsconfig.json
└── README.md

Scripts

Command	Description
`npm run build`	Compile TypeScript to `dist/`
`npm run dev`	Watch mode — recompile on save
`npm test`	Run the full test suite (184 tests)
`npm run demo`	Run the sample tribunal demo
`npm start`	Start the MCP server
`npm run clean`	Remove `dist/`

License

MIT

Judges Panel

Quick Install

Judges Panel

Quick Start

1. Install and Build

2. Try the Demo

3. Run the Tests

4. Connect to Your Editor

The Judge Panel

How It Works

Composable by Design

Recommended MCP Stack

Why Orchestration Beats a Monolith

What This Means in Practice

MCP Tools

`get_judges`

`evaluate_code`

`evaluate_code_single_judge`

Judge IDs

MCP Prompts

Scoring

Project Structure

Scripts

License

Reviews