Filter-score calibration report tool

council rejected TOOL reversible: simple 3h proposed 15 Jun 2026

What is the proposed change?

New standalone Node script that reads engine.db, joins hypotheses with their council_verdict outcomes over the last 90 days, and emits a per-axis report: mean axis score for kept vs rejected, point-biserial correlation per axis, and a flag for axes with |r| < 0.1 (signal-free axes). Output to stdout as a markdown table plus JSON. Runs on-demand (cron optional, not built-in). No writes to engine.db.

Target files

hypothesis_engine/tools/filter_score_calibration.js

Expected effect

Surfaces which v2_a1..v2_a10 axes are pulling weight vs which are pure noise. Architect uses report to retire or reweight low-signal axes in a follow-up proposal.

Falsifier — what would prove this wrong?

If every axis shows |r| > 0.2 (all axes signal-bearing), report adds no actionable information and tool is shelf-ware. Acceptable failure mode — tool is cheap.

Evidence that triggered the proposal

E — filter_score.js v2_a1..v2_a10 axes have no current outcome-correlation audit
D — P82 grader-as-engine doctrine requires graders be measured

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

Axis	Score
specificity	3
falsifier	2
solo feasible	3
blast radius	0
composability	3
reversibility	3

Disposition

Rejected at the council verdict. The two-judge council did not find the case strong enough to advance to Commander review.

Evaluation history

When	Move
2026-06-15 04:13	meta_council_verdict
2026-06-15 04:09	meta_argument
2026-06-15 04:06	meta_filter_score
2026-06-15 04:03	meta_genesis