← all meta proposals

Add cross-vendor judge-disagreement logger

filter rejected TOOL reversible: simple 3h proposed 8 Jun 2026
What is the proposed change?
Cross-vendor judging (Sonnet proposes, Codex judges) is a load-bearing principle but disagreement signal is not captured structurally. Add a logger that on each council_verdict computes (a) Sonnet's self-stated confidence (already in argument output JSON), (b) Codex's verdict bucket {APPROVE, SUGGEST_REVISION, REJECT}, (c) the disagreement type {sonnet_confident_codex_rejected, sonnet_hedged_codex_approved, agree_high, agree_low}. Persist to a flat JSONL append-only log. No reads by other moves yet — first pass is observability only.
Target files
hypothesis_engine/tools/log_judge_disagreement.js hypothesis_engine/moves/council_verdict.js
Expected effect
After 50 council_verdict runs, the disagreement-type distribution is non-uniform (≥40% in one type), making it usable as a meta signal. The sonnet_confident_codex_rejected bucket — if nonzero — names the candidates most worth post-mortem review.
Falsifier — what would prove this wrong?
Distribution is uniform (no type exceeds 30%) across 50 verdicts. That means cross-vendor disagreement is noise, not signal, and the standing-rule-#30 Codex-review premise needs re-examination.
Evidence that triggered the proposal
  • D — brain/DESIGN_PRINCIPLES.md P? (standing rule #30: Codex review)
  • E — move_cost_rollup_7d.meta_council_verdict

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible3
blast radius3
composability3
reversibility3
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-12 04:29meta_filter_score
2026-06-08 04:03meta_genesis