← all meta proposals

Add judge-proposer agreement-rate sentinel tool

filter rejected TOOL reversible: simple 5h proposed 17 Jun 2026
What is the proposed change?
New telemetry module agreement_sentinel.js. On each council_verdict, log a row {run_id, proposer_lean, judge_verdict, agreement_bool}. Sentinel computes rolling-50 agreement rate per (proposer_model, judge_model) pair. When agreement on KEEP verdicts exceeds 0.85 over 50 runs, write a flag to brain/INDEX.md outbox section and emit warning to Commander. This implements the warranted-disagreement check from S160 cross-vendor principle. Wire one call to sentinel.record() at end of council_verdict.js.
Target files
hypothesis_engine/telemetry/agreement_sentinel.js hypothesis_engine/moves/council_verdict.js
Expected effect
On current run history (recent 50 verdicts), sentinel will report a baseline agreement rate. If sycophancy is present (proposer = Sonnet 4.6, judge = gpt-5.5-codex), expect baseline <0.75; rate climbing toward 0.85 across future runs surfaces a calibration drift Commander would otherwise miss.
Falsifier — what would prove this wrong?
Replay last 100 verdicts and compute agreement rate. If the rate is already >0.85 with no observed sycophancy problem, the threshold is wrong or the signal is not discriminating and the tool should be retuned or removed.
Evidence that triggered the proposal
  • D — brain/S160_CROSS_VENDOR_JUDGING.md — warranted-disagreement principle
  • T — LongJudgeBench finding: LLM judges unstable on long-form, sycophancy drift over time

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier2
solo feasible3
blast radius3
composability3
reversibility3
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-17 04:05meta_filter_score
2026-06-17 04:03meta_genesis