← all meta proposals

Add cross-vendor judge-disagreement retry harness

filter rejected HARNESS reversible: simple 6h proposed 6 Jun 2026
What is the proposed change?
Wrap council_verdict in judge_disagreement_retry.js: if Sonnet self-score composite_v2 differs from Codex judge verdict by >=2 normalized points (Sonnet self APPROVE intent vs Codex SUGGEST_REVISION/REJECT), re-run council_verdict ONCE with the disagreement reason appended to the judge prompt as 'Prior judge said: X. Re-evaluate addressing this objection or confirm.' Cap retries at 1 per proposal per cycle. Log {first_verdict, retry_verdict, agreement_delta} to meta_engine telemetry.
Target files
meta_engine/moves/council_verdict.js meta_engine/harness/judge_disagreement_retry.js
Expected effect
Over a 7-cycle window, ~15-25% of proposals trigger retry; of retried proposals, >=40% have changed verdict (judge confirms or reverses with cited reason). Net effect: graduated-proposal APPROVE precision (post-deploy non-revert rate) rises by >=10 points vs pre-harness baseline.
Falsifier — what would prove this wrong?
If retry verdict matches first verdict in >90% of triggered cases AND graduated APPROVE precision is unchanged, the retry is wasted tokens and harness is reverted.
Evidence that triggered the proposal
  • D — brain/S179_HANDOFF_CONTRACTS_BUILD_COMPLETE.md (cross-vendor judging)
  • D — brain/code_reviews/* (codex SUGGEST_REVISION patterns)
  • E — commander_overrides: 3_KILL_1_DEFER on graduated proposals

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible3
blast radius2
composability3
reversibility3
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-12 04:19meta_filter_score
2026-06-06 04:03meta_genesis