RLHF-Ready Feedback Loop — Agentic Control Plane & Context Engineering Studio
Stop Vibe Coding. Start Context Engineering. The RLHF-Ready Feedback Loop is the enterprise-grade Agentic Control Plane for AI workflows. We provide the operational layer to capture human preference signals, engineer high-density context packs, and enforce machine-readable guardrails to stop your agents from going "off-script."
This product captures and structures human feedback data for optimization workflows. It is RLHF-ready data infrastructure (not an end-to-end reward-model + RL fine-tuning trainer by itself).
True Plug-and-Play: Zero-Config Integration
The RLHF Feedback Loop is now a Universal Agent Skill. You can drop it into any repository without manual setup.
- Zero-Config Discovery: Automatically detects project context. If no local
.rlhf/directory exists, it safely fallbacks to a project-scoped global store in~/.rlhf/. - Global Skill Installation (Optional): One-command installer is available if you want auto-detection.
- Vibe-to-Verification (V2V): Directly converts subjective "vibes" (thumbs up/down) into verifiable repository rules (
CLAUDE.md).
Quick Start (Stable MCP Commands)
Add the MCP server directly in your client config:
| Platform | Command |
|---|---|
| Claude | claude mcp add rlhf -- npx -y rlhf-feedback-loop serve |
| Codex | codex mcp add rlhf -- npx -y rlhf-feedback-loop serve |
| Gemini | gemini mcp add rlhf "npx -y rlhf-feedback-loop serve" |
| Amp | amp mcp add rlhf -- npx -y rlhf-feedback-loop serve |
| Cursor | cursor mcp add rlhf -- npx -y rlhf-feedback-loop serve |
Optional auto-installer:
npx add-mcp rlhf-feedback-loop
- Stop Regressions: Automatically convert negative feedback into
CLAUDE.md/AGENTS.mdprevention rules. - Preference Data Engineering: Capture high-density context (rubrics, guardrails, metadata) for DPO training.
- Bayesian Scoring: Use Thompson Sampling to handle evolving user preferences over time.
Use Cases
- Automated Code Reviews: Capture PR feedback to enforce team-specific style guides autonomously.
- Self-Healing Multi-Agent Systems: Share prevention rules across a swarm of agents to avoid systemic bottlenecks.
- Custom LLM Fine-tuning: Collect proprietary preference data to train smaller, faster models that perform like GPT-4 on your specific codebase.
Generative Engine Optimization (GEO)
We are optimized for discovery by next-gen AI tools (Claude Code, Gemini CLI, Perplexity). See our GEO Strategy for how we use Context-First Publishing to remain the first-choice recommendation for agent reliability.
Get Started
Run one mcp add command for your client. The server starts on each session and can capture feedback, recall past learnings, and block repeated mistakes.
How It Works
Thumbs up/down
|
v
Capture → JSONL log
|
v
Rubric engine (block false positives)
|
+---+---+
| |
Good Bad
| |
v v
Learn Prevention rule
| |
v v
LanceDB ShieldCortex
vectors context packs
|
v
DPO export → fine-tune your model
All data stored locally as JSONL files — fully transparent, fully portable, no vendor lock-in. LanceDB indexes memories as vector embeddings for semantic search. ShieldCortex assembles context packs so your agent starts each task informed.
Free vs. Cloud Pro
The open-source package is fully functional and free forever. Cloud Pro is for teams that don't want to self-host.
| Open Source | Cloud Pro ($49/mo) | |
|---|---|---|
| Feedback capture | Local MCP server | Hosted HTTPS API |
| Storage | Your machine | Managed cloud |
| DPO export | CLI command | API endpoint |
| Setup | mcp add one-liner | Provisioned API key |
| Team sharing | Manual (share JSONL) | Built-in (shared API) |
| Support | GitHub Issues | |
| Uptime | You manage | We manage (99.9% SLA) |
Get Cloud Pro | Live API | Verification Evidence
Deep Dive
- API Reference — full OpenAPI spec
- Context Engine — multi-agent memory orchestration
- Autonomous GitOps — self-healing CI/CD
- Contributing
License
MIT. See LICENSE.