MCP Hub
Back to servers

Saiten MCP Server

Automates the evaluation of GitHub hackathon submissions by providing tools to collect issue data, apply scoring rubrics, and generate ranking reports. It enables a multi-agent system to manage scoring consistency, statistical outlier detection, and feedback generation through the GitHub CLI.

Updated
Feb 13, 2026

Saiten — Agents League @ TechConnect Scoring Agent

Submission Track: 🎨 Creative Apps — GitHub Copilot

Overview

A multi-agent system that automatically scores all Agents League @ TechConnect hackathon submissions and generates ranking reports — just type @saiten-orchestrator score all in VS Code.

Designed with Orchestrator-Workers + Prompt Chaining + Evaluator-Optimizer patterns, 6 Copilot custom agents autonomously collect GitHub Issue submissions, evaluate them against track-specific rubrics, validate scoring consistency, and generate reports via an MCP (Model Context Protocol) server.


Agent Workflow

Design Patterns

  • Orchestrator-Workers: @saiten-orchestrator delegates to 5 specialized sub-agents
  • Prompt Chaining: Collect → Score → Review → Report with Gates at each step
  • Evaluator-Optimizer: Reviewer validates scores, triggers re-scoring on FLAG
  • Handoff: Commenter posts feedback only after explicit user confirmation
  • SRP (Single Responsibility Principle): 1 agent = 1 responsibility

Reasoning Patterns

  • Chain-of-Thought (CoT): Scorer evaluates each criterion sequentially, building evidence chain before calculating weighted total
  • Evaluator-Optimizer Loop: Reviewer detects 5 bias types (central tendency, halo effect, leniency, range restriction, anchoring) → FLAGs → Scorer re-evaluates with specific guidance → max 2 cycles
  • Gate-based Error Recovery: Each workflow step has a validation gate; failures trigger graceful degradation (skip + warn) rather than hard stops
  • Evidence-Anchored Scoring: Rubrics define explicit evidence_signals (positive/negative) per criterion; scorers must cite signals from actual submission content

Reliability Features

  • Exponential Backoff Retry: gh CLI calls retry up to 3 times on rate limits (429) and server errors (5xx) with exponential delay
  • Rate Limiting: Sliding-window rate limiter (30 calls/60s per tool) prevents GitHub API abuse
  • Input Validation: All MCP tool inputs validated at boundaries (Fail Fast) — scores 1-10, weighted_total 0-100, required fields checked
  • Corrupted Data Recovery: scores.json auto-backed up on parse failure, server continues with empty store
  • Idempotent Operations: Re-scoring safely overwrites existing entries by issue_number key

Workflow Diagram

flowchart TD
    User["👤 User\n@saiten-orchestrator score all"]

    subgraph Orchestrator["🏆 @saiten-orchestrator"]
        Route["Intent Routing\nUC-01~06"]
        Gate1{"Gate: MCP\nConnectivity"}
        Gate2{"Gate: Data\nCompleteness"}
        Gate3{"Gate: Score\nValidity"}
        Gate4{"Gate: Review\nPASS/FLAG?"}
        Integrate["Result Integration\n& User Report"]
        Handoff["[Handoff]\n💬 Post Feedback"]
    end

    subgraph Collector["📥 @saiten-collector"]
        C1["list_submissions()"]
        C2["get_submission_detail()"]
        C3["Data Validation"]
    end

    subgraph Scorer["📊 @saiten-scorer"]
        S1["get_scoring_rubric()"]
        S2["Rubric-based Evaluation\n1-10 score per criterion"]
        S3["Quality Self-Check"]
        S4["save_scores()"]
    end

    subgraph Reviewer["🔍 @saiten-reviewer"]
        V1["Load scores.json"]
        V2["Statistical Outlier\nDetection (2σ)"]
        V3["Rubric Consistency\nCheck"]
        V4["Bias Detection"]
    end

    subgraph Reporter["📋 @saiten-reporter"]
        R1["generate_ranking_report()"]
        R2["Trend Analysis"]
        R3["Report Validation"]
    end

    subgraph Commenter["💬 @saiten-commenter"]
        CM1["Generate Comment\nper Top N"]
        CM2["User Confirmation\n(Human-in-the-Loop)"]
        CM3["gh issue comment"]
    end

    subgraph MCP["⚡ saiten-mcp (FastMCP Server)"]
        T1["list_submissions"]
        T2["get_submission_detail"]
        T3["get_scoring_rubric"]
        T4["save_scores"]
        T5["generate_ranking_report"]
    end

    subgraph External["External"]
        GH["GitHub API\n(gh CLI)"]
        FS["Local Storage\ndata/ & reports/"]
    end

    User --> Route
    Route --> Gate1
    Gate1 -->|OK| Collector
    Gate1 -->|FAIL| User

    C1 --> C2 --> C3
    C3 --> Gate2
    Gate2 -->|OK| Scorer
    Gate2 -->|"⚠️ Skip"| Integrate

    S1 --> S2 --> S3
    S3 -->|PASS| S4
    S3 -->|"FAIL: Re-evaluate"| S2
    S4 --> Gate3
    Gate3 -->|OK| Reviewer

    V1 --> V2 --> V3 --> V4
    V4 --> Gate4
    Gate4 -->|PASS| Reporter
    Gate4 -->|"FLAG: Re-score"| Scorer

    R1 --> R2 --> R3
    R3 --> Integrate --> User
    Integrate --> Handoff
    Handoff -->|"User clicks"| Commenter
    CM1 --> CM2 --> CM3

    Collector -.->|MCP| T1 & T2
    Scorer -.->|MCP| T3 & T4
    Reporter -.->|MCP| T5
    T1 & T2 -.-> GH
    T4 & T5 -.-> FS
    CM3 -.-> GH

    style Orchestrator fill:#1a1a2e,stroke:#e94560,color:#fff
    style Collector fill:#16213e,stroke:#0f3460,color:#fff
    style Scorer fill:#16213e,stroke:#0f3460,color:#fff
    style Reviewer fill:#1a1a2e,stroke:#e94560,color:#fff
    style Reporter fill:#16213e,stroke:#0f3460,color:#fff
    style Commenter fill:#0f3460,stroke:#533483,color:#fff
    style MCP fill:#0f3460,stroke:#533483,color:#fff

Agent Roster

AgentRoleSRP ResponsibilityMCP Tools
🏆 @saiten-orchestratorOrchestratorIntent routing, delegation, result integration— (delegates all)
📥 @saiten-collectorWorkerGitHub Issue data collection & validationlist_submissions, get_submission_detail
📊 @saiten-scorerWorkerRubric-based evaluation with quality gateget_scoring_rubric, save_scores
🔍 @saiten-reviewerEvaluatorScore consistency review & bias detectionget_scoring_rubric, read scores
📋 @saiten-reporterWorkerRanking report generation & trend analysisgenerate_ranking_report
💬 @saiten-commenterHandoffGitHub Issue feedback comments (user-confirmed)gh issue comment

Design Principles Applied

PrincipleHow Applied
SRPEach agent handles exactly 1 responsibility (6 agents × 1 duty)
Fail FastGates at every step; anomalies reported immediately
SSOTAll score data centralized in data/scores.json
Feedback LoopScorer → Reviewer → Re-score loop (Evaluator-Optimizer pattern)
Human-in-the-LoopCommenter runs only after explicit user confirmation via Handoff
TransparencyTodo list shows progress; each Gate reports status
IdempotencyRe-scoring overwrites; safe to run multiple times
ISPEach sub-agent receives only the tools and data it needs

System Architecture

┌─────────────────────────────────────────────────────────┐
│  VS Code                                                 │
│                                                          │
│  ┌────────────────────────────────────────────────────┐  │
│  │ 🏆 @saiten-orchestrator                          │  │
│  │    ├── 📥 @saiten-collector (Worker)               │  │
│  │    ├── 📊 @saiten-scorer   (Worker)                │  │
│  │    ├── 🔍 @saiten-reviewer (Evaluator)             │  │
│  │    ├── 📋 @saiten-reporter (Worker)                │  │
│  │    └── 💬 @saiten-commenter (Handoff)              │  │
│  └──────────────┬─────────────────────────────────────┘  │
│                 │ MCP (stdio)                             │
│  ┌──────────────▼─────────────────────────────────────┐  │
│  │ ⚡ saiten-mcp (FastMCP Server / Python)             │  │
│  │  ├ list_submissions()     ← gh CLI → GitHub        │  │
│  │  ├ get_submission_detail() ← gh CLI → GitHub       │  │
│  │  ├ get_scoring_rubric()   ← YAML files             │  │
│  │  ├ save_scores()          → data/scores.json       │  │
│  │  └ generate_ranking_report() → reports/*.md        │  │
│  └────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Setup

Prerequisites

  • Python 3.10+
  • uv (package manager)
  • gh CLI (GitHub CLI, authenticated)
  • VS Code + GitHub Copilot

Installation

# Clone the repository
git clone <repo-url>
cd FY26_techconnect_saiten

# Create Python virtual environment
uv venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies (production)
uv pip install -e .

# Install development dependencies (includes pytest + coverage)
uv pip install -e ".[dev]"

# Verify gh CLI authentication
gh auth status

Environment Variables

No secrets are required for normal operation.

# Copy the template (optional — only needed for CI or non-VS Code environments)
cp .env.example .env
VariableRequiredDescription
GITHUB_TOKENNogh CLI manages its own auth. Only set for CI environments

Security: This project uses gh CLI authentication and VS Code Copilot's built-in Azure OpenAI credentials. No API keys are stored in code or config files.

VS Code Configuration

.vscode/mcp.json automatically configures the MCP server. No additional setup required.


Usage

Type the following in the VS Code chat panel:

CommandDescriptionAgents Used
@saiten-orchestrator score allScore all submissionscollector → scorer → reviewer → reporter
@saiten-orchestrator score #48Score a single submissioncollector → scorer → reviewer → reporter
@saiten-orchestrator rankingGenerate ranking reportreporter only
@saiten-orchestrator rescore #48Re-score a submissioncollector → scorer → reviewer → reporter
@saiten-orchestrator show rubric for CreativeDisplay scoring rubricDirect response (MCP)
@saiten-orchestrator review scoresReview score consistencyreviewer only

Project Structure

FY26_techconnect_saiten/
├── .github/agents/
│   ├── saiten-orchestrator.agent.md  # 🏆 Orchestrator
│   ├── saiten-collector.agent.md     # 📥 Data Collection Worker
│   ├── saiten-scorer.agent.md        # 📊 Scoring Worker
│   ├── saiten-reviewer.agent.md      # 🔍 Score Reviewer (Evaluator)
│   ├── saiten-reporter.agent.md      # 📋 Report Worker
│   └── saiten-commenter.agent.md     # 💬 Feedback Commenter (Handoff)
├── src/saiten_mcp/
│   ├── server.py                     # MCP Server + rate limiter + structured logging
│   ├── models.py                     # Pydantic data models with boundary validation
│   └── tools/
│       ├── submissions.py            # list_submissions, get_submission_detail
│       ├── rubrics.py                # get_scoring_rubric
│       ├── scores.py                 # save_scores
│       └── reports.py                # generate_ranking_report
├── data/
│   ├── rubrics/                      # Track-specific scoring rubrics (YAML)
│   └── scores.json                   # Scoring results (SSOT)
├── reports/
│   └── ranking.md                    # Auto-generated ranking report
├── scripts/
│   └── run_scoring.py                # CLI scoring pipeline
├── tests/
│   ├── conftest.py                   # Shared test fixtures
│   ├── test_models.py                # Pydantic model validation tests
│   ├── test_parsers.py               # Issue body parser tests
│   ├── test_rubrics.py               # Rubric YAML integrity tests
│   ├── test_scores.py                # Score persistence & validation tests
│   ├── test_reports.py               # Report generation tests
│   ├── test_reliability.py           # Retry, rate limiting, error handling tests
│   └── test_e2e.py                   # E2E integration tests
├── .vscode/mcp.json                  # MCP server config
├── AGENTS.md                         # Agent registry
└── pyproject.toml

Testing

The project has a comprehensive test suite with 110 tests covering models, parsers, tools, reliability, and reports.

# Run all tests
python -m pytest tests/ -v

# Run with coverage report
python -m pytest tests/ --cov=saiten_mcp --cov-report=term-missing

# Run only unit tests (no network calls)
python -m pytest tests/ -m "not e2e" -v

# Run integration tests (requires gh CLI auth)
python -m pytest tests/ -m e2e -v

Test Structure

Test FileTestsWhat It Covers
test_models.py17Pydantic models, validation boundaries, evidence-anchored fields
test_parsers.py28Issue body parsing, track detection, URL extraction, checklists
test_rubrics.py20Rubric YAML integrity, weights, scoring policy, evidence signals
test_scores.py9Score persistence, idempotency, input validation, sorting
test_reports.py8Markdown report generation, empty/missing data edge cases
test_reliability.py10Retry logic, rate limiting, error handling, gh CLI resilience
test_e2e.py5End-to-end MCP tool calls with live GitHub data
Total11088% code coverage

Scoring Tracks

TrackCriteriaNotes
🎨 Creative Apps5 criteriaCommunity Vote (10%) excluded; remaining 90% prorated to 100%
🧠 Reasoning Agents5 criteriaUses common overall criteria
💼 Enterprise Agents3 criteriaCustom 3-axis evaluation

Demo

The multi-agent workflow can be invoked directly from VS Code's chat panel:

Scoring a Single Submission

👤 User: @saiten-orchestrator score #49

🏆 @saiten-orchestrator → Routes to collector → scorer → reviewer → reporter

📥 @saiten-collector: Fetched Issue #49 (EasyExpenseAI)
   ├─ Track: Creative Apps
   ├─ Repo: github.com/chakras/Easy-Expense-AI
   ├─ README: 10,036 chars extracted
   └─ Gate: ✅ Data complete

📊 @saiten-scorer: Evidence-anchored evaluation
   ├─ Accuracy & Relevance: 8/10
   │   Evidence: "5-agent Semantic Kernel pipeline with Azure Document Intelligence"
   ├─ Reasoning: 7/10
   │   Evidence: "Linear pipeline, no self-correction loop"
   ├─ Total: 73.9/100
   └─ Gate: ✅ All criteria scored with evidence

🔍 @saiten-reviewer: Bias check passed
   ├─ Outlier check: PASS (within 2σ)
   ├─ Evidence quality: PASS (no generic phrases)
   └─ Gate: ✅ PASS

📋 @saiten-reporter: Report saved → reports/ranking.md

Scoring All Submissions

👤 User: @saiten-orchestrator score all

🏆 @saiten-orchestrator: Processing 43 submissions across 3 tracks...
   ├─ 📥 Collecting → 📊 Scoring → 🔍 Reviewing → 📋 Reporting
   ├─ Progress tracked via Todo list
   └─ Final report: reports/ranking.md

Key Differentiators

  • Evidence-anchored scoring: Each criterion requires specific evidence from the submission, not generic phrases
  • Self-correction loop: Reviewer FLAGs biased scores → Scorer re-evaluates → until PASS
  • Real-time progress: Todo list updates visible in VS Code during multi-submission scoring
  • Human-in-the-loop: Feedback comments only posted after explicit user confirmation via Handoff

Troubleshooting

IssueCauseSolution
gh command failedgh CLI not authenticatedRun gh auth login
scores.json corruptedInterrupted writeAuto-restored from .json.bak backup
ValueError: issue_number must be positiveBad input to save_scoresCheck score data format matches schema
Invalid track nameTypo in track parameterUse: creative-apps, reasoning-agents, or enterprise-agents
MCP server not startingPython env mismatchEnsure uv pip install -e . in the .venv
No submissions returnedNetwork or auth issueRun gh api repos/microsoft/agentsleague-techconnect/issues --jq '.[0].number' to test

Corrupted Data Recovery

If data/scores.json becomes corrupted, the server automatically:

  1. Logs a warning with the parse error
  2. Creates a backup at data/scores.json.bak
  3. Continues with an empty score store

To restore manually:

cp data/scores.json.bak data/scores.json

Tech Stack

LayerTechnology
Agent FrameworkVS Code Copilot Custom Agent (.agent.md) — Orchestrator-Workers pattern
MCP ServerPython 3.10+ / FastMCP (stdio transport)
Package Manageruv
GitHub Integrationgh CLI / GitHub REST API with exponential backoff retry and rate limiting
Data ModelsPydantic v2 with boundary validation (scores 1-10, weighted_total 0-100)
Data StorageJSON (scores) / YAML (rubrics) / Markdown (reports) with backup & recovery
Testingpytest + pytest-cov — 110 tests, 88% coverage
Error HandlingRetry with backoff, rate limiting, input validation, corrupted file recovery

License

MIT

Reviews

No reviews yet

Sign in to write a review