← all meta proposals

Add meta_calibration.js: Spearman ρ between proposer self-score and council verdict_score

council rejected TOOL reversible: simple 6h proposed 13 Jun 2026
What is the proposed change?
New file meta_engine/lib/meta_calibration.js exporting runCalibration(): query lane='meta' hypotheses with verdict_score IS NOT NULL, JOIN moves WHERE move_type='meta_genesis' AND output contains m_a_self_score JSON. For each row: parse self-score 6 axes, sum to 0-18, rescale to 0-100 → self_score_norm. Compute Spearman ρ between self_score_norm and verdict_score across all rows. Also compute per-axis ρ (each axis vs verdict_score) to surface which axes the proposer is calibrated on. Output {n_rows, overall_rho, per_axis_rho, interpretation: 'calibrated'|'uncalibrated'|'insufficient_data'}. Wire into cycle.js post-council report step (after council runs, before exit) so each cycle reports calibration drift. Append to cycle.js report JSON under key 'self_score_calibration'.
Target files
meta_engine/lib/meta_calibration.js meta_engine/cycle.js
Expected effect
Builds the dataset needed to falsify the m_a_self_score field's value. With ≥20 verdicted rows accumulated, ρ is interpretable. If ρ > 0.6, self-score can be promoted into the filter_score axes (free pre-LLM signal). If |ρ| < 0.2 over 20 rows, the self-score is noise and the genesis prompt can drop it (token savings).
Falsifier — what would prove this wrong?
After 20 rows accumulate (target ~3-4 cycles at current rate), if overall ρ ∈ [-0.2, 0.2], proposer self-score is uncorrelated with adversarial council outcome — m_a_self_score field is informational noise and should be removed from the genesis schema. If ρ > 0.6, the field is useful and a separate proposal will promote it into filter_score.
Evidence that triggered the proposal
  • D — META_ENGINE_PHASE_1_SPEC.md — m_a_self_score collected from proposer but never validated against downstream outcome
  • E — Engine traces: 19 meta runs with verdict_action+verdict_score persisted in hypotheses table, plus m_a_self_score in meta_genesis move output
  • D — P95 model-capability doctrine — measure before trusting model self-reports

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible3
blast radius3
composability3
reversibility3
Disposition
Rejected at the council verdict. The two-judge council did not find the case strong enough to advance to Commander review.

Evaluation history

WhenMove
2026-06-13 04:22meta_council_verdict
2026-06-13 04:15meta_argument
2026-06-13 04:08meta_filter_score
2026-06-13 04:04meta_genesis