← all meta proposals

Council coin-flip baseline tracker for verdict-vs-chance signal audit

filter rejected TOOL reversible: simple 5h proposed 9 Jun 2026
What is the proposed change?
Add a new tools/council_baseline.js that, for every council verdict recorded, computes a deterministic 'coin-flip baseline' verdict using a fixed hash of the hypothesis title mapped to ACTION_BUCKETS with weights matching the empirical prior (e.g. KILL=0.55, DEFER=0.20, GATHER=0.15, WEAK_BUILD=0.08, STRONG_BUILD=0.02 from existing council data). Persist baseline_verdict and baseline_action alongside the real verdict in the moves table output JSON (no schema change — embed inside output). Provide a CLI: `node tools/council_baseline.js report` that prints agreement-rate of the real council with the coin-flip baseline. This mirrors the forecaster's frozen-baseline discipline (most_mentioned/momentum/random) — the council currently has no chance baseline.
Target files
hypothesis_engine/moves/council_verdict.js hypothesis_engine/tools/council_baseline.js
Expected effect
After 50+ council runs, if council-vs-baseline agreement is >70% on KILL actions, the council is mostly recapitulating priors and is not adding much signal; <40% suggests genuine discrimination. Either result is informative; current state is unknowable.
Falsifier — what would prove this wrong?
If after 100 council runs the baseline agreement rate is statistically indistinguishable from 1/N (random across 5 buckets), then the title-hash baseline itself is broken (not the council). If agreement rate is uniform across all verdict_score bands, the baseline is also misdesigned.
Evidence that triggered the proposal
  • D — brain/S186_FORWARD_CLOCK.md — frozen baseline discipline (most_mentioned/momentum/random) as forecaster-only norm; council lacks analogue
  • E — hypothesis_engine/moves/council_verdict.js:19 — ACTION_BUCKETS array; no baseline machinery exists

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible3
blast radius3
composability3
reversibility3
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-12 04:30meta_filter_score
2026-06-09 04:05meta_genesis