Add meta_calibration.js: Spearman ρ between proposer self-score and council verdict_score

council rejected TOOL reversible: simple 6h proposed 13 Jun 2026

What is the proposed change?

New file meta_engine/lib/meta_calibration.js exporting runCalibration(): query lane='meta' hypotheses with verdict_score IS NOT NULL, JOIN moves WHERE move_type='meta_genesis' AND output contains m_a_self_score JSON. For each row: parse self-score 6 axes, sum to 0-18, rescale to 0-100 → self_score_norm. Compute Spearman ρ between self_score_norm and verdict_score across all rows. Also compute per-axis ρ (each axis vs verdict_score) to surface which axes the proposer is calibrated on. Output {n_rows, overall_rho, per_axis_rho, interpretation: 'calibrated'|'uncalibrated'|'insufficient_data'}. Wire into cycle.js post-council report step (after council runs, before exit) so each cycle reports calibration drift. Append to cycle.js report JSON under key 'self_score_calibration'.

Target files

meta_engine/lib/meta_calibration.js meta_engine/cycle.js

Expected effect

Builds the dataset needed to falsify the m_a_self_score field's value. With ≥20 verdicted rows accumulated, ρ is interpretable. If ρ > 0.6, self-score can be promoted into the filter_score axes (free pre-LLM signal). If |ρ| < 0.2 over 20 rows, the self-score is noise and the genesis prompt can drop it (token savings).

Falsifier — what would prove this wrong?

After 20 rows accumulate (target ~3-4 cycles at current rate), if overall ρ ∈ [-0.2, 0.2], proposer self-score is uncorrelated with adversarial council outcome — m_a_self_score field is informational noise and should be removed from the genesis schema. If ρ > 0.6, the field is useful and a separate proposal will promote it into filter_score.

Evidence that triggered the proposal

D — META_ENGINE_PHASE_1_SPEC.md — m_a_self_score collected from proposer but never validated against downstream outcome
E — Engine traces: 19 meta runs with verdict_action+verdict_score persisted in hypotheses table, plus m_a_self_score in meta_genesis move output
D — P95 model-capability doctrine — measure before trusting model self-reports

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

Axis	Score
specificity	3
falsifier	3
solo feasible	3
blast radius	3
composability	3
reversibility	3

Disposition

Rejected at the council verdict. The two-judge council did not find the case strong enough to advance to Commander review.

Evaluation history

When	Move
2026-06-13 04:22	meta_council_verdict
2026-06-13 04:15	meta_argument
2026-06-13 04:08	meta_filter_score
2026-06-13 04:04	meta_genesis