kairn

Evidence details

Benchmark slices, modes, and caveats.

This page keeps the internal labels visible for readers who want to audit the evidence more closely.

Detailed results

SliceResultStatus
Fresh routing gateFresh publication v3 preflight passed 8/8 repos after general source and companion-routing fixes.routing gate
Fresh live canaryKairn passed 3/3 report-level quality while baseline missed report-level quality; 58% fewer median tokens.quality rescue
Governed ablation17.85% valid pass/pass savings with safer behavior than no-governor.clean savings
httpcore MCP endurance18.46% median valid active savings, 100% quality, actual MCP calls, useful returned files, zero scope violations.clean MCP
Requests MCP post-fix28.05% median valid savings, 3/3 quality, 3/3 useful MCP file returns, zero scope violations.clean MCP
Click enduranceRepeat-3 pass/pass endurance: session median savings 42.7%; MCP-first median savings 52.6%; zero scope violations.pass/pass
Node SemVer enduranceRepeat-3 pass/pass endurance: session median savings 16.4%; guided MCP canary saved 59.2% with attributed source.pass/pass
OpenLibrary SWE-ProOfficial evaluator pass/pass row: 10,048 tokens with Kairn vs 21,932 baseline.official
SWE-Pro official pilotKairn passed 5/5 generated-patch official rows after general fixes; baseline passed 2/5.quality/scope rescue

Claims matrix

SafeKairn can reduce token use on source-rescue, debug, MCP, and endurance workflows.
SafeKairn may stay silent when confidence is low; that is intentional suppression.
SafeCodex CLI/session is the most-tested path; MCP is the portable editor path.
CarefulSavings vary by task and model; current evidence is strongest on controlled/internal runs.
AvoidDo not claim universal 20-70% savings or public works-anywhere reliability yet.
Back to overviewJoin waitlist