Derive consensus_quality axis from existing per-run spreads, fold into composite

council rejected AXIS reversible: simple 4h proposed 1 Jun 2026

What is the proposed change?

filter_score.js:154-166 already computes runSpreads per filter per run as |high - low|. Today this is recorded in moves.output JSON but never used in the composite formula at line 211. Add: after the 3rd run, retrieve all per-axis spreads across the 3 moves, compute mean_spread = mean(all 15 values). Define consensus_quality = max(0, min(3, 3 - mean_spread)) — i.e. mean spread 0 → 3.0, mean spread 1.5 → 1.5, mean spread >=3 → 0. Persist consensus_quality in a new key inside moves.output JSON (no schema migration). Modify composite at line 211 from `medianTotal - 0.5*iqrTotal + 0.3*signal_count` to `medianTotal - 0.5*iqrTotal + 0.3*signal_count + 0.5*consensus_quality - 1.5` (the -1.5 keeps the composite numerically centred so a 'neutral' consensus of 1.5 contributes 0). The IQR term penalises run-to-run drift; consensus_quality penalises proposer-vs-judge drift inside a single run — orthogonal signal.

Target files

hypothesis_engine/moves/filter_score.js

Expected effect

Hypotheses where Opus(high) and Gemini(low) disagree by >=2 per axis on average will see composite drop by 1.5-3.0 points. Hypotheses where the two judges align tightly gain up to 1.5. Expected rank inversions on ~20-30% of currently-ranked hypotheses.

Falsifier — what would prove this wrong?

Backfill consensus_quality on the 4 Commander-overridden KILL/DEFER cases from corpus E. If their mean spread is not at least 0.5 above the global median mean-spread across all scored hypotheses, the axis does not predict Commander disagreement and provides no orthogonal signal beyond IQR.

Evidence that triggered the proposal

D — hypothesis_engine/moves/filter_score.js:154-166 — existing per-axis spread computation already collected but unused
E — Engine traces Corpus E — Commander overrides on 4 KILL/DEFER cases (model-disagreement is the candidate hidden variable)

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

Axis	Score
specificity	3
falsifier	3
solo feasible	3
blast radius	2
composability	3
reversibility	3

Disposition

Rejected at the council verdict. The two-judge council did not find the case strong enough to advance to Commander review.

Evaluation history

When	Move
2026-06-01 04:13	meta_council_verdict
2026-06-01 04:10	meta_argument
2026-06-01 04:07	meta_filter_score
2026-06-01 04:05	meta_genesis