MCP Hub
Back to servers

arthur-mcp

Requires Setup

Ground truth verification for AI-generated code plans. Catches hallucinated file paths, schema references, imports, env vars, types, API routes, Express/Fastify routes, and Supabase schema — deterministically, zero cost, no API key.

Stars
1
Updated
Feb 16, 2026
Validated
Mar 4, 2026

Quick Install

npx -y arthur-mcp

Arthur

Ground truth verification for AI-generated code. Catches hallucinated file paths, schema references, imports, env vars, types, and API routes before code gets written — deterministically, zero cost, no API key.

Install

claude mcp add arthur -- npx arthur-mcp

That's it. Arthur is now available as an MCP server in Claude Code. All tools run locally — no API key, no credits, no config.

Why

AI coding assistants hallucinate. They reference files that don't exist, use wrong schema names, import packages that aren't installed, and build plans on assumptions that don't match the codebase.

The model can't reliably catch its own mistakes. In benchmarks, self-review with full project context and an adversarial prompt missed 40% of verifiable errors. A simple file existence check catches 100%, every time.

Arthur runs deterministic checks against ground truth (your actual files, schemas, packages) and returns the results — including what actually exists — so Claude Code can self-correct. No second LLM call needed.

How It Works

  1. Claude Code generates a plan
  2. Claude Code calls Arthur's check_all tool
  3. Arthur validates every reference against ground truth (file tree, schemas, node_modules, .env files, types, routes)
  4. Arthur returns findings with the correct values — not just "this is wrong" but "this is wrong, here's what actually exists"
  5. Claude Code reads the findings and corrects its plan

The LLM reasoning step is free because it's the same Claude Code session you're already paying for.

Recommended Setup

Add this to your project's CLAUDE.md so Claude Code uses Arthur automatically:

## Verification

Before implementing any plan, call the `check_all` MCP tool with the plan text and project directory.
Fix all hallucinated references using the ground truth provided in the response before writing code.

Tools

check_all (recommended)

Runs all 7 checkers in a single call. Returns a comprehensive report with ground truth context for every finding. This is the tool Claude Code should call.

check_all(planText, projectDir)

Individual Checkers

ToolWhat it catchesGround truth source
check_pathsHallucinated file pathsProject directory tree
check_schemaWrong Prisma models, fields, methods, relationsschema.prisma
check_sql_schemaWrong Drizzle/SQL tables, columnspgTable() / CREATE TABLE
check_importsNon-existent packages, invalid subpathsnode_modules + package.json
check_envUndefined environment variables.env* files
check_typesHallucinated TypeScript types/membersProject .ts/.tsx files
check_routesNon-existent API routes, wrong methodsNext.js App Router route.ts files

All checkers auto-detect — no flags needed. If a project has no Prisma schema, that checker silently returns nothing.

verify_plan (optional, requires API key)

Full pipeline: all static checks + LLM review by a separate Claude instance. Use this for deep plan review (intent drift, wrong abstractions, missing requirements). Requires ANTHROPIC_API_KEY.

What the Output Looks Like

When Arthur finds a hallucinated Prisma model:

✗ prisma.engagement — hallucinated-model → prisma.participantEngagement
  Available models: participant (Participant), contentItem (ContentItem),
                    participantEngagement (ParticipantEngagement)

When Arthur finds a hallucinated file path:

✗ src/models/User.ts — NOT FOUND
  Closest: src/lib/db.ts, src/app/api/participants/route.ts

When Arthur finds a wrong relation:

✗ include: { comments } — wrong-relation
  Available relations on ContentItem: author → Participant, engagements → ParticipantEngagement

Claude Code reads these findings and knows exactly what to use instead. No guessing, no second tool call.

Benchmark Results

Static Analysis vs Self-Review (Opus 4.6)

11 prompts across 4 fixtures, 5 checker categories. Self-review had the full project tree, all schema files, and a maximally adversarial prompt.

CategoryErrors FoundSelf-Review MissedSelf-Review Detection Rate
Path301163%
Schema (Prisma)190100%
SQL Schema (Drizzle)15150%
Import22577%
Env70100%
Total933760%

Self-review missed 37 errors that Arthur caught deterministically. SQL schema references were a complete blind spot — 0% detection. 2.2% false positive rate.

Schema Hallucination Detail

Fixture: Next.js + Prisma with non-obvious naming (Participant not User, participantEngagement not engagement).

TaskSchema RefsHallucinatedRate
Analytics dashboard11327.3%
Recommendation engine18422.2%
CSV export7114.3%
Avg122.721.3%

Recurring hallucination: prisma.engagement (should be prisma.participantEngagement) appeared in all 3 runs — systematic bias, not random noise.

CLI (alternative to MCP)

npm install -g arthur-mcp

# Full verification (static analysis + LLM review)
codeverifier verify --plan plan.md --project ./my-app

# Set API key for LLM verification
codeverifier init

Development

git clone https://github.com/ZachDeLong/arthur.git
cd arthur
npm install
npm run build

# Run MCP server locally
npm run mcp

# Run benchmarks
npm run bench:big         # Static analysis vs self-review
npm run bench:tier1       # Path + schema hallucination detection
npm run bench:tier2       # Intent drift detection
npm run bench:report      # Generate markdown report

License

MIT

Reviews

No reviews yet

Sign in to write a review