← all meta proposals

Add codex-judge harness for borderline council_verdict

filter rejected HARNESS reversible: simple 6h proposed 20 Jun 2026
What is the proposed change?
Wrap council_verdict so that any proposal whose composite is within 0.5 of the graduation threshold is dispatched to the gpt-5.5-codex CLI (already available per rule #30) with a fixed judge prompt that returns one of {GRADUATE, KILL, DEFER} plus a one-sentence reason. Codex verdict tiebreaks the borderline cases only — non-borderline cases unchanged. Log codex verdict + reason to engine.db.
Target files
hypothesis_engine/moves/council_verdict.js hypothesis_engine/harness/codex_tiebreak.js
Expected effect
Borderline (within 0.5) decisions become decorrelated from the Sonnet judge that produced the composite. Expect the codex tiebreaker to disagree with sonnet on 30-50% of borderlines (validating the decorrelation hypothesis from S202 design hygiene).
Falsifier — what would prove this wrong?
Run 30 borderline cases through both judges. If codex agrees with sonnet >85% of the time, judges are correlated and the harness adds no signal — remove it. If <50% agreement, judges are too noisy and threshold logic should be revisited instead.
Evidence that triggered the proposal
  • D — rule #30 — non-trivial code dispatched to gpt-5.5-codex (same principle applies to non-trivial verdicts)
  • D — S202 cross-vendor judging design hygiene (decorrelated errors)

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible3
blast radius1
composability3
reversibility3
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-20 04:07meta_filter_score
2026-06-20 04:03meta_genesis