← all meta proposals

Add argument-trace summarizer harness around judge call

filter rejected HARNESS reversible: simple 7h proposed 17 Jun 2026
What is the proposed change?
Wrap the argument output before it flows into council_verdict. New harness compresses the argument trace into a structured summary {claims:[], strongest_objection, residual_uncertainty} before passing to judge. Judge prompt receives summary, NOT raw trace. Raw trace is persisted to disk for Commander review but not sent to judge. Add a flag --raw_trace_judge that can disable the harness for A/B comparison.
Target files
hypothesis_engine/moves/argument.js hypothesis_engine/moves/council_verdict.js
Expected effect
On a 30-run A/B (15 with summarizer, 15 raw), judge verdicts under summarizer mode show lower variance (stdev of composite KEEP/KILL across two judge re-runs of same candidate drops by >20%) per the reasoning-trace fluency-trap finding. Token cost on judge call drops ~40%.
Falsifier — what would prove this wrong?
If summarizer-mode judge verdicts diverge from raw-mode verdicts on >25% of candidates AND raw-mode correlates better with Commander overrides on those candidates, the summary is dropping load-bearing detail and the harness should be disabled.
Evidence that triggered the proposal
  • T — Reasoning-trace fluency trap: summary outperforms full trace for judging
  • E — move_cost_rollup: argument+judge dominate per-run cost

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible2
blast radius2
composability3
reversibility3
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-17 04:06meta_filter_score
2026-06-17 04:03meta_genesis