← all meta proposals

Spread-triggered third-reviewer tie-break in filter_score composite

filter rejected HARNESS reversible: simple 4h proposed 5 Jun 2026
What is the proposed change?
After the per-filter spread computation loop (around line 158-166) and BEFORE the move is recorded at line 169, add: if any single filter has spread ≥ 2 (i.e. hi=3,lo=1 or hi=3,lo=0 or hi=2,lo=0), make one additional call to llm.callSonnet46 with a tiebreak system prompt asking ONLY for the disputed filter(s), supplied with both HIGH and LOW justifications inline. Replace the midpoint for tied filters with the tiebreaker's score (clamped to [lo,hi]). Persist tiebreak_used:true and tiebreak_filters:[...] inside the move output JSON for telemetry. Cost gate: skip tie-break if `opts.fast === true` (preserve the cheap path).
Target files
hypothesis_engine/moves/filter_score.js
Expected effect
On the next 20 hypotheses to run filter_score, the run_total standard deviation across the 3 runs of each hypothesis decreases by ≥30% vs. the previous 20-hypothesis baseline measured from `engine.db` move history. Composite IQR penalty (-0.5×IQR) is materially smaller for hypotheses where the disagreement was on one filter rather than systemic.
Falsifier — what would prove this wrong?
Compute run_total IQR baseline over the last 20 hypotheses that completed all 3 runs. After deploy, observe the next 20. If post-deploy IQR is NOT lower (Mann-Whitney U, one-sided p<0.1), the tiebreak is not improving consistency and should be reverted. Also: if tiebreak fires on <10% of runs, the trigger threshold is too high.
Evidence that triggered the proposal
  • E — hypothesis_engine/moves/filter_score.js lines 158-166 — current midpoint masks wide HIGH/LOW disagreement; -0.5*IQR composite penalty already implies the team knows spread is signal but discards which axis caused it
  • D — S179 handoff contracts: runTransition wrapper precedent for adding atomic post-move conditionals without state-machine changes

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible3
blast radius2
composability2
reversibility3
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-12 04:16meta_filter_score
2026-06-05 04:04meta_genesis