Add rubric-stability harness wrapping filter_score with dual phrasing

filter rejected HARNESS reversible: medium 6h proposed 10 Jun 2026

What is the proposed change?

New harness runs filter_score twice per candidate: once with current v2.3 rubric prose, once with semantically-equivalent paraphrased rubric (same axis definitions, different surface wording — stored as v2.3_paraphrase.md). Compute per-axis variance and composite variance. If any axis variance >1.0 OR composite variance >2.0, set rubric_unstable=true on candidate. Status flow change: rubric_unstable candidates blocked from filter_kept; routed to council_verdict for human re-grading.

Target files

hypothesis_engine/harnesses/rubric_stability.js hypothesis_engine/moves/filter_score.js

Expected effect

On 40-item meta hypothesis sample, 4-8 candidates flagged rubric_unstable; flagged set overlaps with ≥2 of the 4 Commander overrides, demonstrating rubric-gaming signal predicts override risk.

Falsifier — what would prove this wrong?

If <2 candidates flagged across 40 (threshold too strict, or S158/S159 finding doesn't generalize) OR if overlap with override set is 0 (signal uncorrelated with override-worthy candidates), remove harness.

Evidence that triggered the proposal

D — brain/S158_PAPER_RUBRIC_FINDING.md
D — brain/S159_FOLLOWUP.md

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

Axis	Score
specificity	3
falsifier	3
solo feasible	2
blast radius	2
composability	3
reversibility	2

Disposition

Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

When	Move
2026-06-12 04:33	meta_filter_score
2026-06-10 04:03	meta_genesis