Evidence details
Benchmark slices, modes, and caveats.
This page keeps the internal labels visible for readers who want to audit the evidence more closely.
Detailed results
| Slice | Result | Status |
|---|---|---|
| Fresh routing gate | Fresh publication v3 preflight passed 8/8 repos after general source and companion-routing fixes. | routing gate |
| Fresh live canary | Kairn passed 3/3 report-level quality while baseline missed report-level quality; 58% fewer median tokens. | quality rescue |
| Governed ablation | 17.85% valid pass/pass savings with safer behavior than no-governor. | clean savings |
| httpcore MCP endurance | 18.46% median valid active savings, 100% quality, actual MCP calls, useful returned files, zero scope violations. | clean MCP |
| Requests MCP post-fix | 28.05% median valid savings, 3/3 quality, 3/3 useful MCP file returns, zero scope violations. | clean MCP |
| Click endurance | Repeat-3 pass/pass endurance: session median savings 42.7%; MCP-first median savings 52.6%; zero scope violations. | pass/pass |
| Node SemVer endurance | Repeat-3 pass/pass endurance: session median savings 16.4%; guided MCP canary saved 59.2% with attributed source. | pass/pass |
| OpenLibrary SWE-Pro | Official evaluator pass/pass row: 10,048 tokens with Kairn vs 21,932 baseline. | official |
| SWE-Pro official pilot | Kairn passed 5/5 generated-patch official rows after general fixes; baseline passed 2/5. | quality/scope rescue |
Claims matrix
| Safe | Kairn can reduce token use on source-rescue, debug, MCP, and endurance workflows. |
| Safe | Kairn may stay silent when confidence is low; that is intentional suppression. |
| Safe | Codex CLI/session is the most-tested path; MCP is the portable editor path. |
| Careful | Savings vary by task and model; current evidence is strongest on controlled/internal runs. |
| Avoid | Do not claim universal 20-70% savings or public works-anywhere reliability yet. |