01
Click endurance
29k
saved
42.7%
less token use
91k baseline to 62k with Kairn.
10-turn debugging session.
Evidence
Controlled runs show Kairn reducing token use, keeping long sessions focused, and improving source routing. The detailed split is available for deeper review.
Token use
These rows compare successful runs against successful runs. Bars show percent reduction; labels show tokens saved.
Validated slices
8
fresh, endurance, MCP, official
Median reduction
42.7%
clean rows shown
Official tasks passed
5/5
small SWE-Pro pilot
Endurance suites
4
long-session evidence
01
29k
saved
42.7%
less token use
91k baseline to 62k with Kairn.
10-turn debugging session.
02
11k
saved
16.4%
less token use
70k baseline to 58k with Kairn.
Fresh long-session JavaScript package test.
03
12k
saved
54.2%
less token use
22k baseline to 10k with Kairn.
Official evaluator row.
Coverage
Kairn has been tested across clean token-savings rows, longer sessions, MCP delivery, official evaluator rows, and governor ablations.
On clean rows, Kairn reduced token use by 16.4-54.2% after both sides completed the task.
measured
Endurance runs show Kairn keeping source focus across repeated debugging and verification turns.
repeat-tested
MCP-first runs recorded real tool calls and useful source guidance, so Kairn is not limited to terminal hooks.
MCP
A small SWE-Pro pilot passed 5/5 Kairn rows, compared with 2/5 baseline rows.
SWE-Pro
Several controlled rows passed with Kairn when the baseline missed quality, showing the value of better source focus.
rescue
Ablation showed the governor improves safety by shrinking, suppressing, or asking for evidence instead of over-assisting.
governor
Quality
Some controlled rows passed with Kairn after the baseline missed quality. Those rows are shown separately from token-savings rows.
official evaluator pass rate
Small generated-patch official evaluator slice; includes quality-rescue rows.
Codex alone
2/5
tasks passed
40%
pass rate
Codex + Kairn
5/5
tasks passed
100%
pass rate
Interpretation
Kairn is strongest when the task would otherwise cause repeated source searching.
Kairn's silence is a feature: it avoids adding context when context is not useful.
The strongest savings rows count only after quality gates pass.
Codex session is the most-tested path; MCP is the portable editor path.
Details
The detailed page keeps modes, MCP runs, official evaluator rows, and caveats visible for technical review.