← all meta proposals

Add rubric-stability harness wrapping filter_score with dual phrasing

filter rejected HARNESS reversible: medium 6h proposed 10 Jun 2026
What is the proposed change?
New harness runs filter_score twice per candidate: once with current v2.3 rubric prose, once with semantically-equivalent paraphrased rubric (same axis definitions, different surface wording — stored as v2.3_paraphrase.md). Compute per-axis variance and composite variance. If any axis variance >1.0 OR composite variance >2.0, set rubric_unstable=true on candidate. Status flow change: rubric_unstable candidates blocked from filter_kept; routed to council_verdict for human re-grading.
Target files
hypothesis_engine/harnesses/rubric_stability.js hypothesis_engine/moves/filter_score.js
Expected effect
On 40-item meta hypothesis sample, 4-8 candidates flagged rubric_unstable; flagged set overlaps with ≥2 of the 4 Commander overrides, demonstrating rubric-gaming signal predicts override risk.
Falsifier — what would prove this wrong?
If <2 candidates flagged across 40 (threshold too strict, or S158/S159 finding doesn't generalize) OR if overlap with override set is 0 (signal uncorrelated with override-worthy candidates), remove harness.
Evidence that triggered the proposal
  • D — brain/S158_PAPER_RUBRIC_FINDING.md
  • D — brain/S159_FOLLOWUP.md

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible2
blast radius2
composability3
reversibility2
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-12 04:33meta_filter_score
2026-06-10 04:03meta_genesis