Calibration

The engines track record, scored against itself. These are the numbers a customer should look at before trusting any verdict.

Headline metrics

74
total hypotheses
49%
graduation rate (decided)
10%
commander override rate
$3.61
avg cost per hypothesis
Override rate is the percentage of graduated-or-overridden cases where the human disagreed with the engine. A high rate means the engine is missing something the human catches; a low rate means the engine is well-calibrated. Currently 10% — within the target band of 10-25%.

Filter score distribution (graduated)

Among the 35 graduated hypotheses, where they fell on the composite filter score (out of 15).

Score bandCount
9.0-9.910
10.0-10.921
11.0-11.93
12.0+1

Commander overrides

ActionCount
DEFER1
KILL3

Why hypotheses get killed

ReasonCount
evidence_search_exhausted15
move_cap_reached3
council_verdict_unanimous_kill1

Cost transparency

Total engine spend across all moves: $267.25 across 1,549 logged operations. Average cost per hypothesis from admission to current state: $3.61.

Known limitations