decision-pathfinder

A decision tree engine for LLM-driven workflows. Agents follow typed trees instead of freestyling every step. Over repeated runs the system records which paths succeeded, learns which are shortest, and starts skipping the LLM entirely for decisions it has confidently solved before.

The paths are the knowledge — no vector DB, no embeddings, no RAG. Just JSON and heuristics.

How it works

Run 1:  LLM decides  → 600ms, 0% confidence
Run 3:  LLM + bias   → 500ms, 30% confidence
Run 7:  Override     → 0ms LLM, 70% confidence
Run 15: Locked in    → 0ms LLM, 100% confidence

Three things make the flywheel work:

Efficiency-weighted confidence — a 3-step successful path ranks higher than a 10-step successful path. Agents naturally discover shortcuts; wasted tool calls get pruned.
Persistent history — completed sessions go to ~/.decision-pathfinder/sessions/{treeId}.jsonl. Every process restart picks up where the last one left off.
Confidence-gated override — above 60% confidence, the LLM call is skipped entirely and the historically-best edge is taken directly. The tool calls still run (they do real work) — only the decision-making LLM call disappears.

Where LLMs fit in

The LLM is called in exactly one place: at branch points in a tree. When the executor reaches a node with multiple outgoing edges, it asks "which edge do I take?" — nothing else uses an LLM.

Component	LLM used?
RecommendationEngine	No — pure heuristics
PathTracker	No — just records
TreeExecutor	No — runs tree logic
Tool handlers	No — your code runs
Conditional evaluators	No — registered functions
Decision at branch points	Yes — via `IDecisionMaker`
Overridden decisions (high confidence)	No — skipped

IDecisionMaker is an interface. Ships with:

ClaudeAdapter — Anthropic Claude (defaults to claude-haiku-4-5)
OpenAIAdapter — OpenAI (defaults to gpt-4o-mini)
GeminiAdapter — Google Gemini (defaults to gemini-2.0-flash-lite)
MockDecisionMaker — picks the first available edge (deterministic, no LLM)
Your own implementation for local models or other providers.

Install

npm install decision-pathfinder

Quick start

import {
  DecisionTree,
  ConversationNode,
  ToolCallNode,
  SuccessNode,
  TreeExecutor,
  MockDecisionMaker,
  PathTracker,
} from 'decision-pathfinder';

const tree = new DecisionTree();

tree.addNode(new ConversationNode('start', 'Choose path', {
  prompt: 'Pick the best approach for this task.',
}));
tree.addNode(new ToolCallNode('fetch', 'Fetch data', {
  toolName: 'fetchData',
  parameters: { url: 'https://api.example.com' },
}));
tree.addNode(new SuccessNode('done', 'Complete', {
  message: 'Task finished successfully',
}));

tree.addEdge({ id: 'e1', sourceId: 'start', targetId: 'fetch', metadata: {} });
tree.addEdge({ id: 'e2', sourceId: 'fetch', targetId: 'done', metadata: {} });

const tracker = new PathTracker();
const executor = new TreeExecutor(
  tree,
  new MockDecisionMaker(),
  tracker,
  {
    toolHandlers: new Map([
      ['fetchData', async (params) => ({ data: 'real result' })],
    ]),
  },
);

const result = await executor.execute('start');
// result.status === 'success'
// result.pathTaken === ['start', 'fetch', 'done']

LLM providers

Pick whichever provider matches the key you already have:

import { ClaudeAdapter, OpenAIAdapter, GeminiAdapter } from 'decision-pathfinder';

// Anthropic — cheap and fast decisions
const claude = new ClaudeAdapter({
  apiKey: process.env.ANTHROPIC_API_KEY!,
  modelName: 'claude-haiku-4-5',  // default
});

// OpenAI
const openai = new OpenAIAdapter({
  apiKey: process.env.OPENAI_API_KEY!,
  modelName: 'gpt-4o-mini',  // default
});

// Gemini
const gemini = new GeminiAdapter({
  apiKey: process.env.GEMINI_API_KEY!,
  modelName: 'gemini-2.0-flash-lite',  // default
});

const executor = new TreeExecutor(tree, claude, tracker);

Persistent learning

Use PersistentPathTracker to get cross-session learning automatically:

import { SessionStore, PersistentPathTracker } from 'decision-pathfinder';

const store = new SessionStore();                  // ~/.decision-pathfinder/sessions/
const tracker = new PersistentPathTracker(store, 'my-task');
await tracker.initialize();                         // loads all prior sessions

const executor = new TreeExecutor(tree, adapter, tracker);
await executor.execute('start');
// Session appended to my-task.jsonl on endSession()
// Next time this script runs, history is preserved

Node types

Type	Purpose	Key fields
`ConversationNode`	LLM decision point	`prompt`, `expectedResponses`, `systemMessage`
`ToolCallNode`	Execute a tool	`toolName`, `parameters`, `timeout`, `retryCount`
`ConditionalNode`	Branch on a condition	`condition`, `evaluator`, `trueEdgeId`, `falseEdgeId`
`SuccessNode`	Terminal success	`message`, `resultData`
`FailureNode`	Terminal failure	`message`, `errorCode`, `recoverable`, `suggestedAction`

Recommendation engine

Pure heuristics — no LLM. Analyzes execution history to provide:

Edge recommendations with efficiency-weighted confidence scores
Bottleneck detection (nodes with high failure rates)
Path analysis (most common, most successful, shortest successful)

import { RecommendationEngine } from 'decision-pathfinder';

const engine = new RecommendationEngine(tree, tracker);
const rec = engine.getEdgeRecommendation('decision-node-id');
// { recommendedEdgeId: 'e2', confidence: 0.85, reasoning: '...' }

const report = engine.generateOptimizationReport();
// { analysis: { shortestSuccessfulPath, ... }, bottlenecks, edgeRecommendations }

Confidence formula:

confidence = success_rate × sample_factor × efficiency_factor
  sample_factor    = min(samples / 10, 1)
  efficiency_factor = shortest_known / this_path_avg_length

Serialization

import { TreeSerializer } from 'decision-pathfinder';

const serializer = new TreeSerializer();
const json = serializer.toJSON(tree);
const restored = serializer.fromJSON(json);

// Custom node types
serializer.registerNodeType('custom', (s) => new MyCustomNode(s.id, s.label, s.data));

MCP Server

decision-pathfinder ships with an MCP server so Claude Code, Cursor, and other MCP clients can use decision trees directly — including recording new trees as the agent works.

Setup

For most users, this is the entire config — no env block needed. The MCP server inherits env vars from its parent process, so any API key you already have exported (e.g. ANTHROPIC_API_KEY from your shell) gets picked up automatically:

{
  "mcpServers": {
    "decision-pathfinder": {
      "command": "npx",
      "args": ["decision-pathfinder-mcp"]
    }
  }
}

If you want to pin a specific provider or key:

{
  "mcpServers": {
    "decision-pathfinder": {
      "command": "npx",
      "args": ["decision-pathfinder-mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "ANTHROPIC_MODEL": "claude-haiku-4-5"
      }
    }
  }
}

Provider auto-detection

The server checks env vars in priority order and uses the first one it finds:

Priority	Env var	Provider	Default model
1	`ANTHROPIC_API_KEY`	Claude	`claude-haiku-4-5`
2	`OPENAI_API_KEY`	OpenAI	`gpt-4o-mini`
3	`GEMINI_API_KEY`	Gemini	`gemini-2.0-flash-lite`
4	(none)	Mock (picks first edge)	—

Override the model per-provider (ANTHROPIC_MODEL, OPENAI_MODEL, GEMINI_MODEL) or globally (DP_MODEL).

Two LLMs at once

When you use the MCP server from Claude Code:

Claude is the agent — it calls MCP tools and drives your session
The auto-detected provider is used inside dp_execute_tree for branch-point decisions

This is an economic choice — the calling agent is typically powerful/expensive, but tree traversal is a high-volume low-complexity task where a cheap/fast model is plenty. If the user is already on Anthropic, Claude Haiku handles branch decisions for pennies. Same key, right-sized model.

Future: when MCP clients (Claude Code, Cursor, Codex CLI) add sampling support, the server will use sampling/createMessage to delegate decisions to the host's LLM — eliminating the need for any env vars at all.

Available tools

Recording — capture sessions as they happen:

Tool	Description
`dp_start_recording`	Begin recording a new task
`dp_record_step`	Append a step (tool call, decision, condition check)
`dp_record_branch`	Mark a decision point with alternatives considered
`dp_finalize_recording`	End with success/failure, optionally save to file

Playback + analytics:

Tool	Description
`dp_load_tree`	Load a tree from a JSON file or inline JSON
`dp_list_trees`	List all loaded trees
`dp_get_history_summary`	Show accumulated wisdom for a tree (use BEFORE executing)
`dp_execute_tree`	Execute a tree (uses Gemini + recommendations automatically)
`dp_get_recommendation`	Get edge recommendation at a node
`dp_get_analytics`	Success rates, bottlenecks, path analysis
`dp_export_tree`	Export a tree to JSON

Example workflow

First time doing a task — the LLM records as it goes:

dp_start_recording({ taskName: "deploy-staging" }) → recordingId

[Claude does the work, calling tools normally, also calling dp_record_step after each step]

dp_record_step({ stepType: "tool_call", label: "Check git status" })
dp_record_step({ stepType: "tool_call", label: "Run tests" })
dp_record_step({ stepType: "tool_call", label: "Deploy", edgeCondition: "tests passed" })

dp_finalize_recording({
  outcome: "success",
  outcomeMessage: "Deployed to staging",
  savePath: "./trees/deploy-staging.json"
})

Next time — the LLM loads the tree and executes it:

dp_get_history_summary({ treeId: "deploy-staging" })
  → { totalSessions: 8, successRate: 1.0, shortestSuccessfulSteps: 3 }

dp_execute_tree({ treeId: "deploy-staging" })
  → follows the proven path, overrides kick in for familiar decisions

Everything runs locally. Trees, history, and recommendations stay on your machine.

Scripts

npm run build      # compile to dist/
npm run test       # 153 tests
npm run lint       # biome check
npm run demo       # tree-driven README generator using Gemini
npm run benchmark  # cross-model benchmark harness (flash-lite vs flash vs pro)

Benchmark results

The benchmark harness runs 7 scenarios designed to stress-test LLM decision-making, each executed across multiple teacher models with iterative learning:

Scenario	What it tests
Ambiguous Routing	3-way choice from subtle context
Tool Chain Failures	Unreliable tools with fallbacks
Multi-Step Reasoning	Combine 3 clues across 8 steps
Adversarial Prompts	Double negatives, inverted labels
High Branching	6-way region selection
Recovery Paths	Primary endpoint always down
Speed vs Accuracy	Fast (70% fail) vs careful (5% fail)

The flywheel is observable — e.g., on Ambiguous Routing:

Run 1: 602ms, 0% confidence (raw LLM)
Run 7: 0ms,   70% confidence (override kicks in)
Run 15: 0ms, 100% confidence (permanent)

Cross-model mode tests knowledge transfer — a smarter teacher (Pro) establishes successful paths that Flash Lite replays at override-level confidence.

License

ISC

decision-pathfinder

Quick Install

decision-pathfinder

How it works

Where LLMs fit in

Install

Quick start

LLM providers

Persistent learning

Node types

Recommendation engine

Serialization

MCP Server

Setup

Provider auto-detection

Two LLMs at once

Available tools

Example workflow

Scripts

Benchmark results

License

Reviews