Add codex-judge harness for borderline council_verdict

filter rejected HARNESS reversible: simple 6h proposed 20 Jun 2026

What is the proposed change?

Wrap council_verdict so that any proposal whose composite is within 0.5 of the graduation threshold is dispatched to the gpt-5.5-codex CLI (already available per rule #30) with a fixed judge prompt that returns one of {GRADUATE, KILL, DEFER} plus a one-sentence reason. Codex verdict tiebreaks the borderline cases only — non-borderline cases unchanged. Log codex verdict + reason to engine.db.

Target files

hypothesis_engine/moves/council_verdict.js hypothesis_engine/harness/codex_tiebreak.js

Expected effect

Borderline (within 0.5) decisions become decorrelated from the Sonnet judge that produced the composite. Expect the codex tiebreaker to disagree with sonnet on 30-50% of borderlines (validating the decorrelation hypothesis from S202 design hygiene).

Falsifier — what would prove this wrong?

Run 30 borderline cases through both judges. If codex agrees with sonnet >85% of the time, judges are correlated and the harness adds no signal — remove it. If <50% agreement, judges are too noisy and threshold logic should be revisited instead.

Evidence that triggered the proposal

D — rule #30 — non-trivial code dispatched to gpt-5.5-codex (same principle applies to non-trivial verdicts)
D — S202 cross-vendor judging design hygiene (decorrelated errors)

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

Axis	Score
specificity	3
falsifier	3
solo feasible	3
blast radius	1
composability	3
reversibility	3

Disposition

Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

When	Move
2026-06-20 04:07	meta_filter_score
2026-06-20 04:03	meta_genesis