Saiten — Agents League @ TechConnect Scoring Agent
Submission Track: 🎨 Creative Apps — GitHub Copilot
Overview
A multi-agent system that automatically scores all Agents League @ TechConnect hackathon submissions and generates ranking reports — just type @saiten-orchestrator score all in VS Code.
Designed with Orchestrator-Workers + Prompt Chaining + Evaluator-Optimizer patterns, 6 Copilot custom agents autonomously collect GitHub Issue submissions, evaluate them against track-specific rubrics, validate scoring consistency, and generate reports via an MCP (Model Context Protocol) server.
Agent Workflow
Design Patterns
- Orchestrator-Workers:
@saiten-orchestratordelegates to 5 specialized sub-agents - Prompt Chaining: Collect → Score → Review → Report with Gates at each step
- Evaluator-Optimizer: Reviewer validates scores, triggers re-scoring on FLAG
- Handoff: Commenter posts feedback only after explicit user confirmation
- SRP (Single Responsibility Principle): 1 agent = 1 responsibility
Reasoning Patterns
- Chain-of-Thought (CoT): Scorer evaluates each criterion sequentially, building evidence chain before calculating weighted total
- Evaluator-Optimizer Loop: Reviewer detects 5 bias types (central tendency, halo effect, leniency, range restriction, anchoring) → FLAGs → Scorer re-evaluates with specific guidance → max 2 cycles
- Gate-based Error Recovery: Each workflow step has a validation gate; failures trigger graceful degradation (skip + warn) rather than hard stops
- Evidence-Anchored Scoring: Rubrics define explicit
evidence_signals(positive/negative) per criterion; scorers must cite signals from actual submission content
Reliability Features
- Exponential Backoff Retry: gh CLI calls retry up to 3 times on rate limits (429) and server errors (5xx) with exponential delay
- Rate Limiting: Sliding-window rate limiter (30 calls/60s per tool) prevents GitHub API abuse
- Input Validation: All MCP tool inputs validated at boundaries (Fail Fast) — scores 1-10, weighted_total 0-100, required fields checked
- Corrupted Data Recovery:
scores.jsonauto-backed up on parse failure, server continues with empty store - Idempotent Operations: Re-scoring safely overwrites existing entries by
issue_numberkey
Workflow Diagram
flowchart TD
User["👤 User\n@saiten-orchestrator score all"]
subgraph Orchestrator["🏆 @saiten-orchestrator"]
Route["Intent Routing\nUC-01~06"]
Gate1{"Gate: MCP\nConnectivity"}
Gate2{"Gate: Data\nCompleteness"}
Gate3{"Gate: Score\nValidity"}
Gate4{"Gate: Review\nPASS/FLAG?"}
Integrate["Result Integration\n& User Report"]
Handoff["[Handoff]\n💬 Post Feedback"]
end
subgraph Collector["📥 @saiten-collector"]
C1["list_submissions()"]
C2["get_submission_detail()"]
C3["Data Validation"]
end
subgraph Scorer["📊 @saiten-scorer"]
S1["get_scoring_rubric()"]
S2["Rubric-based Evaluation\n1-10 score per criterion"]
S3["Quality Self-Check"]
S4["save_scores()"]
end
subgraph Reviewer["🔍 @saiten-reviewer"]
V1["Load scores.json"]
V2["Statistical Outlier\nDetection (2σ)"]
V3["Rubric Consistency\nCheck"]
V4["Bias Detection"]
end
subgraph Reporter["📋 @saiten-reporter"]
R1["generate_ranking_report()"]
R2["Trend Analysis"]
R3["Report Validation"]
end
subgraph Commenter["💬 @saiten-commenter"]
CM1["Generate Comment\nper Top N"]
CM2["User Confirmation\n(Human-in-the-Loop)"]
CM3["gh issue comment"]
end
subgraph MCP["⚡ saiten-mcp (FastMCP Server)"]
T1["list_submissions"]
T2["get_submission_detail"]
T3["get_scoring_rubric"]
T4["save_scores"]
T5["generate_ranking_report"]
end
subgraph External["External"]
GH["GitHub API\n(gh CLI)"]
FS["Local Storage\ndata/ & reports/"]
end
User --> Route
Route --> Gate1
Gate1 -->|OK| Collector
Gate1 -->|FAIL| User
C1 --> C2 --> C3
C3 --> Gate2
Gate2 -->|OK| Scorer
Gate2 -->|"⚠️ Skip"| Integrate
S1 --> S2 --> S3
S3 -->|PASS| S4
S3 -->|"FAIL: Re-evaluate"| S2
S4 --> Gate3
Gate3 -->|OK| Reviewer
V1 --> V2 --> V3 --> V4
V4 --> Gate4
Gate4 -->|PASS| Reporter
Gate4 -->|"FLAG: Re-score"| Scorer
R1 --> R2 --> R3
R3 --> Integrate --> User
Integrate --> Handoff
Handoff -->|"User clicks"| Commenter
CM1 --> CM2 --> CM3
Collector -.->|MCP| T1 & T2
Scorer -.->|MCP| T3 & T4
Reporter -.->|MCP| T5
T1 & T2 -.-> GH
T4 & T5 -.-> FS
CM3 -.-> GH
style Orchestrator fill:#1a1a2e,stroke:#e94560,color:#fff
style Collector fill:#16213e,stroke:#0f3460,color:#fff
style Scorer fill:#16213e,stroke:#0f3460,color:#fff
style Reviewer fill:#1a1a2e,stroke:#e94560,color:#fff
style Reporter fill:#16213e,stroke:#0f3460,color:#fff
style Commenter fill:#0f3460,stroke:#533483,color:#fff
style MCP fill:#0f3460,stroke:#533483,color:#fff
Agent Roster
| Agent | Role | SRP Responsibility | MCP Tools |
|---|---|---|---|
🏆 @saiten-orchestrator | Orchestrator | Intent routing, delegation, result integration | — (delegates all) |
📥 @saiten-collector | Worker | GitHub Issue data collection & validation | list_submissions, get_submission_detail |
📊 @saiten-scorer | Worker | Rubric-based evaluation with quality gate | get_scoring_rubric, save_scores |
🔍 @saiten-reviewer | Evaluator | Score consistency review & bias detection | get_scoring_rubric, read scores |
📋 @saiten-reporter | Worker | Ranking report generation & trend analysis | generate_ranking_report |
💬 @saiten-commenter | Handoff | GitHub Issue feedback comments (user-confirmed) | gh issue comment |
Design Principles Applied
| Principle | How Applied |
|---|---|
| SRP | Each agent handles exactly 1 responsibility (6 agents × 1 duty) |
| Fail Fast | Gates at every step; anomalies reported immediately |
| SSOT | All score data centralized in data/scores.json |
| Feedback Loop | Scorer → Reviewer → Re-score loop (Evaluator-Optimizer pattern) |
| Human-in-the-Loop | Commenter runs only after explicit user confirmation via Handoff |
| Transparency | Todo list shows progress; each Gate reports status |
| Idempotency | Re-scoring overwrites; safe to run multiple times |
| ISP | Each sub-agent receives only the tools and data it needs |
System Architecture
┌─────────────────────────────────────────────────────────┐
│ VS Code │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ 🏆 @saiten-orchestrator │ │
│ │ ├── 📥 @saiten-collector (Worker) │ │
│ │ ├── 📊 @saiten-scorer (Worker) │ │
│ │ ├── 🔍 @saiten-reviewer (Evaluator) │ │
│ │ ├── 📋 @saiten-reporter (Worker) │ │
│ │ └── 💬 @saiten-commenter (Handoff) │ │
│ └──────────────┬─────────────────────────────────────┘ │
│ │ MCP (stdio) │
│ ┌──────────────▼─────────────────────────────────────┐ │
│ │ ⚡ saiten-mcp (FastMCP Server / Python) │ │
│ │ ├ list_submissions() ← gh CLI → GitHub │ │
│ │ ├ get_submission_detail() ← gh CLI → GitHub │ │
│ │ ├ get_scoring_rubric() ← YAML files │ │
│ │ ├ save_scores() → data/scores.json │ │
│ │ └ generate_ranking_report() → reports/*.md │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Setup
Prerequisites
Installation
# Clone the repository
git clone <repo-url>
cd FY26_techconnect_saiten
# Create Python virtual environment
uv venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies (production)
uv pip install -e .
# Install development dependencies (includes pytest + coverage)
uv pip install -e ".[dev]"
# Verify gh CLI authentication
gh auth status
Environment Variables
No secrets are required for normal operation.
# Copy the template (optional — only needed for CI or non-VS Code environments)
cp .env.example .env
| Variable | Required | Description |
|---|---|---|
GITHUB_TOKEN | No | gh CLI manages its own auth. Only set for CI environments |
Security: This project uses
gh CLIauthentication and VS Code Copilot's built-in Azure OpenAI credentials. No API keys are stored in code or config files.
VS Code Configuration
.vscode/mcp.json automatically configures the MCP server. No additional setup required.
Usage
Type the following in the VS Code chat panel:
| Command | Description | Agents Used |
|---|---|---|
@saiten-orchestrator score all | Score all submissions | collector → scorer → reviewer → reporter |
@saiten-orchestrator score #48 | Score a single submission | collector → scorer → reviewer → reporter |
@saiten-orchestrator ranking | Generate ranking report | reporter only |
@saiten-orchestrator rescore #48 | Re-score a submission | collector → scorer → reviewer → reporter |
@saiten-orchestrator show rubric for Creative | Display scoring rubric | Direct response (MCP) |
@saiten-orchestrator review scores | Review score consistency | reviewer only |
Project Structure
FY26_techconnect_saiten/
├── .github/agents/
│ ├── saiten-orchestrator.agent.md # 🏆 Orchestrator
│ ├── saiten-collector.agent.md # 📥 Data Collection Worker
│ ├── saiten-scorer.agent.md # 📊 Scoring Worker
│ ├── saiten-reviewer.agent.md # 🔍 Score Reviewer (Evaluator)
│ ├── saiten-reporter.agent.md # 📋 Report Worker
│ └── saiten-commenter.agent.md # 💬 Feedback Commenter (Handoff)
├── src/saiten_mcp/
│ ├── server.py # MCP Server + rate limiter + structured logging
│ ├── models.py # Pydantic data models with boundary validation
│ └── tools/
│ ├── submissions.py # list_submissions, get_submission_detail
│ ├── rubrics.py # get_scoring_rubric
│ ├── scores.py # save_scores
│ └── reports.py # generate_ranking_report
├── data/
│ ├── rubrics/ # Track-specific scoring rubrics (YAML)
│ └── scores.json # Scoring results (SSOT)
├── reports/
│ └── ranking.md # Auto-generated ranking report
├── scripts/
│ └── run_scoring.py # CLI scoring pipeline
├── tests/
│ ├── conftest.py # Shared test fixtures
│ ├── test_models.py # Pydantic model validation tests
│ ├── test_parsers.py # Issue body parser tests
│ ├── test_rubrics.py # Rubric YAML integrity tests
│ ├── test_scores.py # Score persistence & validation tests
│ ├── test_reports.py # Report generation tests
│ ├── test_reliability.py # Retry, rate limiting, error handling tests
│ └── test_e2e.py # E2E integration tests
├── .vscode/mcp.json # MCP server config
├── AGENTS.md # Agent registry
└── pyproject.toml
Testing
The project has a comprehensive test suite with 110 tests covering models, parsers, tools, reliability, and reports.
# Run all tests
python -m pytest tests/ -v
# Run with coverage report
python -m pytest tests/ --cov=saiten_mcp --cov-report=term-missing
# Run only unit tests (no network calls)
python -m pytest tests/ -m "not e2e" -v
# Run integration tests (requires gh CLI auth)
python -m pytest tests/ -m e2e -v
Test Structure
| Test File | Tests | What It Covers |
|---|---|---|
test_models.py | 17 | Pydantic models, validation boundaries, evidence-anchored fields |
test_parsers.py | 28 | Issue body parsing, track detection, URL extraction, checklists |
test_rubrics.py | 20 | Rubric YAML integrity, weights, scoring policy, evidence signals |
test_scores.py | 9 | Score persistence, idempotency, input validation, sorting |
test_reports.py | 8 | Markdown report generation, empty/missing data edge cases |
test_reliability.py | 10 | Retry logic, rate limiting, error handling, gh CLI resilience |
test_e2e.py | 5 | End-to-end MCP tool calls with live GitHub data |
| Total | 110 | 88% code coverage |
Scoring Tracks
| Track | Criteria | Notes |
|---|---|---|
| 🎨 Creative Apps | 5 criteria | Community Vote (10%) excluded; remaining 90% prorated to 100% |
| 🧠 Reasoning Agents | 5 criteria | Uses common overall criteria |
| 💼 Enterprise Agents | 3 criteria | Custom 3-axis evaluation |
Demo
The multi-agent workflow can be invoked directly from VS Code's chat panel:
Scoring a Single Submission
👤 User: @saiten-orchestrator score #49
🏆 @saiten-orchestrator → Routes to collector → scorer → reviewer → reporter
📥 @saiten-collector: Fetched Issue #49 (EasyExpenseAI)
├─ Track: Creative Apps
├─ Repo: github.com/chakras/Easy-Expense-AI
├─ README: 10,036 chars extracted
└─ Gate: ✅ Data complete
📊 @saiten-scorer: Evidence-anchored evaluation
├─ Accuracy & Relevance: 8/10
│ Evidence: "5-agent Semantic Kernel pipeline with Azure Document Intelligence"
├─ Reasoning: 7/10
│ Evidence: "Linear pipeline, no self-correction loop"
├─ Total: 73.9/100
└─ Gate: ✅ All criteria scored with evidence
🔍 @saiten-reviewer: Bias check passed
├─ Outlier check: PASS (within 2σ)
├─ Evidence quality: PASS (no generic phrases)
└─ Gate: ✅ PASS
📋 @saiten-reporter: Report saved → reports/ranking.md
Scoring All Submissions
👤 User: @saiten-orchestrator score all
🏆 @saiten-orchestrator: Processing 43 submissions across 3 tracks...
├─ 📥 Collecting → 📊 Scoring → 🔍 Reviewing → 📋 Reporting
├─ Progress tracked via Todo list
└─ Final report: reports/ranking.md
Key Differentiators
- Evidence-anchored scoring: Each criterion requires specific evidence from the submission, not generic phrases
- Self-correction loop: Reviewer FLAGs biased scores → Scorer re-evaluates → until PASS
- Real-time progress: Todo list updates visible in VS Code during multi-submission scoring
- Human-in-the-loop: Feedback comments only posted after explicit user confirmation via Handoff
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
gh command failed | gh CLI not authenticated | Run gh auth login |
scores.json corrupted | Interrupted write | Auto-restored from .json.bak backup |
ValueError: issue_number must be positive | Bad input to save_scores | Check score data format matches schema |
Invalid track name | Typo in track parameter | Use: creative-apps, reasoning-agents, or enterprise-agents |
| MCP server not starting | Python env mismatch | Ensure uv pip install -e . in the .venv |
| No submissions returned | Network or auth issue | Run gh api repos/microsoft/agentsleague-techconnect/issues --jq '.[0].number' to test |
Corrupted Data Recovery
If data/scores.json becomes corrupted, the server automatically:
- Logs a warning with the parse error
- Creates a backup at
data/scores.json.bak - Continues with an empty score store
To restore manually:
cp data/scores.json.bak data/scores.json
Tech Stack
| Layer | Technology |
|---|---|
| Agent Framework | VS Code Copilot Custom Agent (.agent.md) — Orchestrator-Workers pattern |
| MCP Server | Python 3.10+ / FastMCP (stdio transport) |
| Package Manager | uv |
| GitHub Integration | gh CLI / GitHub REST API with exponential backoff retry and rate limiting |
| Data Models | Pydantic v2 with boundary validation (scores 1-10, weighted_total 0-100) |
| Data Storage | JSON (scores) / YAML (rubrics) / Markdown (reports) with backup & recovery |
| Testing | pytest + pytest-cov — 110 tests, 88% coverage |
| Error Handling | Retry with backoff, rate limiting, input validation, corrupted file recovery |
License
MIT