Meta engine — engine improving itself

A third lane that does not propose products. It proposes changes to the engine that proposes products. Each candidate is a concrete, falsifiable, solo-feasible modification — a new filter axis, a corpus tweak, a prompt revision, a tool, a harness. Same evaluation discipline as the other lanes, applied inward.

Accepted 5

Most recent 16 Jun 2026
approved AXIS reversible: simple 5h
On rerun of 43 graduated candidates, fewer than 8 score 3 (most candidates lack testable horizons); the 1 DEFER override scores ≤1. Composite spread of 2-3 points between testable and non-testable cohorts.
approved TOOL reversible: simple 1h
Proposals with no evidence array, empty evidence, or evidence items missing source_corpus/source will be rejected at validation time with a specific reason rather than silently persisting with evidence=[]. Retrocheck: `SELECT id, title FROM hypotheses WHERE lane='meta' AND json_e…
approved TOOL reversible: simple 1h
Proposals with solo_time_estimate in the 17-24h range are now caught by validateProposal() and routed to rejected[] instead of persisting to the DB. The enforcement gap between the system prompt contract (16h) and the validator implementation (24h) is closed. Historical retrochec…
accepted with revision shadow mode GATE reversible: simple 6h
Applied to 43 S157 graduated candidates: hyp-2026-05-06-847f7e (0/5 on S157 manual review) is killed before argument; none of the 25 ROBUST candidates (4-5/5) are killed. Per the move cost rollup, argument + council_verdict + 7 deep moves average approximately $0.12-0.18 per hypo…
accepted with revision PROMPT reversible: simple 2h
Back-scoring hyp-2026-05-14-d3786b (Agronomy Advisory for UK soft-fruit and glasshouse growers — institutional trade-channel buyers) and hyp-2026-05-11-cc72cd (Bot-Promise Slip for B2B Support Ops — enterprise procurement buyers) with revised prompt produces solo_founder_feasible…

Awaiting Commander decision 4

Most recent 23 Jun 2026
awaiting decision GATE reversible: simple 4h
10-25% of incoming meta proposals short-circuit to DROP without LLM call, cutting meta_filter LLM cost proportionally and freeing reviewer attention for novel proposals. Distinct-change_type ratio per cycle rises.
awaiting decision TOOL reversible: simple 4h
First run reports the known drift (lane, meta_ship_status, pool_status, verdict_*) plus any other we have not noticed. Subsequent runs after schema.sql is reconciled report zero drift.
awaiting decision TOOL reversible: simple 3h
After 7 days, capture rate (proposals-rejected / proposals-generated) is measurable per axis and per stage. Reveals whether v2_a3 'solo_feasibility' carries >50% of rejections (suspected over-weight per S158 Round 2).
awaiting decision AXIS reversible: simple 4h
Hypotheses whose title leads with AE's mechanism (e.g. forecast-audit / claim-gate / drift-ledger pattern — the 847f7e shape) score 0-1 on v2_a11; hypotheses whose title leads with the buyer's pain ('Dental practice software complaints triage') score 2-3. Predicted 1.5-2.5 point …

Deferred / rejected 172

Most recent 24 Jun 2026
council rejected HARNESS reversible: simple 6h
After 1 week of operation, vertical distribution Gini coefficient across emitted candidates drops by >=0.15 vs prior 7d baseline. Cooldown fires on at least one vertical. Overall genesis throughput drops by no more than 20%.
council rejected PROMPT reversible: simple 2h
On a replay of council_verdict against the 4 historical override candidates, council's verdict flips to match commander on >=3/4 cases. On a 30-candidate forward sample, council PASS rate on audit-shaped candidates drops by >=30% vs the 7-day baseline.
filter rejected AXIS reversible: simple 8h
Among the 43 graduated candidates, median v2_a11 score is >=1; among killed candidates, median is 0. Composite score spread between graduated and killed widens by >=0.5 points after adding the axis. At least 1 historically-killed candidate would have passed filter with the new ax…
filter rejected TOOL reversible: simple 6h
Over the next 100 genesis events, >=4 candidates are killed early as predicted orphans (matching the historical 7%-of-kills base rate). Leave-one-out accuracy on the 7 historical orphan kills is >=5/7. Cumulative move-cost saving over 100 events: ~one filter_score + one argument …
filter rejected GATE reversible: simple 4h
Re-running the gate on the 3 commander-KILL'd audit-shaped candidates (a38d31, c89a71, 6bf9c5) fires on all 3. Re-running on a control set of 10 graduated non-audit candidates fires on 0-1 of them. Net effect on next 100 candidates: 5-15% killed pre-argument, saving argument cost…
council rejected PROMPT reversible: simple 2h
Among the next 60 meta_filter_score DROP reasons, mentions of 'invented evidence' / 'evidence not citable' / 'cargo-cult citation' decrease by >=50%. Proposer self-kills (rejected[] array length) increase modestly. Net KEEP rate may not move, but downstream judge confidence impro…
council rejected HARNESS reversible: simple 3h
KEEP rate drops 15-40% vs single-judge baseline. Council-stage and Commander-stage rejection rate on meta proposals that survived filter drops measurably, because surviving proposals cleared two independent vendor families.
filter rejected HARNESS reversible: simple 3h
Across the next 50 filter_score invocations, parse_error rate drops by >=60% and the corresponding cost-without-phase-advance burn drops proportionally. Repair calls cost ~10-20% of an original call.
filter rejected GATE reversible: simple 3h
The S198-class monoculture-from-frozen-queries failure mode (12 frozen dev-infra queries for many cycles) becomes impossible to enter silently. Either rotation happens, or the Commander explicitly authorizes the stale run with a logged reason.
council rejected TOOL reversible: simple 4h
S204-class 'Gemini=REJECT-truncated' verdicts no longer count as REJECT in quorum math. Quorum either resolves cleanly (after re-run) or pauses for Commander, with audit trail showing which models gave AMBIGUOUS.
filter rejected HARNESS reversible: simple 4h
A B5-class regression (scheduler running engine_1 moves on meta-lane rows) is caught on first violation instead of accumulating 97 wasted moves. Lane-mismatch row count visible in daily cycle summary.
council rejected GATE reversible: simple 5h
S204-class calibrations that fold S201 proxy results into both training and evaluation sides hard-fail at gate-time instead of producing a 'validated' instrument that re-encodes the original bias.
filter rejected AXIS reversible: simple 6h
Proposals leaning solely on PUBLIC text-visible sources (the Codex #7 shared-bias class) score 0-1; cross-family proposals score 2-3, producing 1-2 point composite spread that correlates with Gate 1 PROCEED outcomes on the S204 instrument.
council rejected CORPUS reversible: medium 10h
Within 2 weeks, ≥30% of meta-lane filter_kept proposals cite at least one costly_behaviour row in their argument move (current rate: ~0% per S203 baseline). Solo_feasibility false-positives drop ≥20%.
filter rejected HARNESS reversible: simple 5h
On replay of last 30 council verdicts, panels lacking cross-vendor judge get flagged; commander override rate on cross-vendor-judged subset is ≥30% lower than same-vendor subset.
filter rejected PROMPT reversible: simple 3h
On last 30 council verdicts, ≥4 flip from KEEP→KILL (low-confidence dissenter outweighs high-confidence majority) or KILL→KEEP. Commander override rate drops by ≥25% on the flipped subset.
filter rejected GATE reversible: medium 8h
Solo-founder-feasible KD-threshold hallucination class (documented in S160) disappears from meta-lane output. Manual audit of 20 fresh outputs shows 0 unsourced empirical claims vs current ~30% rate.
filter rejected GATE reversible: simple 6h
On meta-lane backlog of last 7d, ≥15% of currently-kept proposals get re-classified as diversity_rejected; commander KILL/DEFER rate on remaining drops by ≥30%.
council rejected AXIS reversible: simple 4h
On 43 graduated candidates from S157, 847f7e-shape proposals (the 14 NBJ-failed ones) score ≤1 on a11; ROBUST candidates score 2-3. Composite spread widens 2-3 points and matches S157 sweep classification.
filter rejected HARNESS reversible: simple 6h
Borderline (within 0.5) decisions become decorrelated from the Sonnet judge that produced the composite. Expect the codex tiebreaker to disagree with sonnet on 30-50% of borderlines (validating the decorrelation hypothesis from S202 design hygiene).
council rejected TOOL reversible: medium 8h
Genesis produces fewer proposals in the over-represented kill buckets. Expect top-2 kill buckets to shrink by 25-40% in subsequent cycles; bottom buckets to grow (zero-sum redistribution, not net improvement on first deployment).
council rejected PROMPT reversible: simple 3h
Genesis output will carry a structured status_quo field on ≥95% of proposals after change; v2_a12 scoring will be deterministic against this field rather than NLP-inferred. Expect fewer 'invented problem' proposals (estimated 20-30% reduction in council-stage 'no buyer would pay'…
filter rejected GATE reversible: simple 6h
S200 cross-memo dedup runs post-genesis; G_DIV catches the upstream skew before any filter compute is spent. Expect ~10-15% of cycles to trigger re-draw initially, dropping to <5% as corpus rotation matures.
council rejected AXIS reversible: simple 4h
Proposals from genesis that skip displacement framing will score 0-1 and lose 1-2 composite points. Expect graduation rate of 'greenfield-only' proposals to drop while displacement-framed proposals rise.
filter rejected AXIS reversible: simple 4h
Breadth-first proposals (per S202 ratification) that omit market-stage will score 0-1 on a11, dropping their composite by ~1-2 points and reducing pass-through of stage-blind candidates by an estimated 15-25%.
council rejected TOOL reversible: simple 4h
Within 7 days, surfaces which of the 4 baseline + 8 rotating pain-template corpora (S198) are dollar-efficient and which are subsidizing low-yield shapes. Decision input for next corpus reweighting cycle.
filter rejected TOOL reversible: simple 3h
P3 outputs gain >=95% evidence-marker compliance within a week (currently anecdotal; no measurement). Telemetry surfaces which corpora/templates systematically produce evidence-less rubric outputs.
council rejected GATE reversible: simple 5h
100% of newly-graduated proposals carry a parseable forward-clock falsifier. Backfill: existing graduated proposals are not re-evaluated. Within 30 days, the engine has a deterministic grading queue and can self-report hit-rate.
filter rejected GATE reversible: medium 6h
v2_backfill_orphan_S148 (7x in current trace) should drop to <=2 in the next 30-day window. Total proposals reaching meta_council_verdict should fall by 10-20%, reducing move-cost.
filter rejected AXIS reversible: simple 4h
Proposals currently scoring high on v2_a7 via vague 'reach via social' phrasing will drop 1-2 composite points; proposals with named channel + audience will be unchanged. Top-10 ranking should reshuffle by ~30%, with B_PILOT-shaped (UK recruitment, named channel) proposals climbi…
council rejected TOOL reversible: simple 3h
Output CSV will name ≥3 session-tag blocks where `decage_status` flags 'decaged but still in prompt': specifically S121 evaluator-side cage (decaged S183), S151 archetype rotation (claimed temporary through 2026-06-14 — already expired by today 2026-06-18), and the S183 product-s…
council rejected HARNESS reversible: simple 4h
Currently ~10-20% of completed scoring sequences end with IQR ≥ 1.5 (visible in `filter_score_iqr` histogram). For these high-disagreement candidates the tiebroken_median will shift composite by ≥0.5 points in ≥50% of cases, changing the top-of-queue ordering. Promotion/kill deci…
filter rejected CORPUS reversible: simple 5h
On a 5-cycle replay (one per rotated corpus), the genesis `fresh_signal_source_id` field should report 5 distinct source-corpus origins; the prior 30 admitted hypotheses currently report ≤3 distinct origins. Archetype-family histogram across 30 admissions should hit ≥6 of F1-F10 …
council rejected GATE reversible: simple 6h
On the active pool, at least 1-3 of the next 20 ranked candidates will collide at cosine ≥ 0.82 with a sibling admitted within the prior 14 days. Saturated sub-cluster (S136 procurement/SOW family) should see the highest collision rate. Active-pool variety per the S151 archetype_…
filter rejected HARNESS reversible: simple 7h
On a 30-run A/B (15 with summarizer, 15 raw), judge verdicts under summarizer mode show lower variance (stdev of composite KEEP/KILL across two judge re-runs of same candidate drops by >20%) per the reasoning-trace fluency-trap finding. Token cost on judge call drops ~40%.
filter rejected AXIS reversible: simple 3h
847f7e-shape proposals (per Corpus D pattern reference) score 0-1 on v2_a12; ROBUST candidates score 2-3. Expected composite spread of 2-3 points between the two classes. Graduation rate on metaphor-heavy candidates drops by ~25%.
filter rejected TOOL reversible: simple 5h
On current run history (recent 50 verdicts), sentinel will report a baseline agreement rate. If sycophancy is present (proposer = Sonnet 4.6, judge = gpt-5.5-codex), expect baseline <0.75; rate climbing toward 0.85 across future runs surfaces a calibration drift Commander would o…
filter rejected GATE reversible: simple 6h
Of the 7 v2_backfill_orphan_S148 kills in Corpus E, all 7 would be caught at the pre-filter stage. Argument/judge spend on these candidates drops to $0. Total saved per 100-run batch: ~$0.40 (7 candidates * argument+judge cost).
filter rejected AXIS reversible: simple 4h
On the 4 Commander-overridden KILL cases (all audit-shaped per Corpus E override log), v2_a11 scores 0-1 producing composite drop of 2-3 points vs prior runs. At least 3 of the 4 fall below 17/33 composite and would have been auto-killed without override.
filter rejected PROMPT reversible: simple 2h
F11 exempt admits will be bounded at 5 per 14-day window. If the engine currently runs hot on F11 (which is the risk of an unbounded exemption), this caps it. Expected hit rate of the new cap: 1-3 reject events per month if F11 was previously absorbing the saturation pressure fro…
filter rejected GATE reversible: simple 2h
Hypotheses with a load-bearing weakness on ONE axis but high scores elsewhere will be killed regardless of composite. Estimate: 5-12% of currently-ranked hypotheses have a median-0 axis (visible by reparsing moves.scores_json). Those hypotheses currently graduate to 'scored'/'ran…
filter rejected GATE reversible: simple 5h
Judges that anchor their entire R3 verdict on (e.g.) only 'risk' will be excluded. Predicted effect: ~10-20% of R3 verdicts that survive the citation-count check fail source-diversity. Net: STRONG_BUILD/KILL escalations from R3 will be drawn from a more grounded set of judges, re…
filter rejected TOOL reversible: simple 3h
Closed-archetype proposals that slip past the LLM's prompt-side filter (typically when wording is subtle, e.g. 'forecast accuracy benchmarking for capital allocators') will hit the tripwire ~95%+ of the time when any banned term is literally present. Will catch an estimated 5-15%…
filter rejected GATE reversible: simple 4h
Hypotheses where HIGH and LOW judges disagree systematically on one filter (e.g. 'fast_feedback_loops' hi=3 lo=0, hi=3 lo=0, hi=2 lo=0 → mean_spread=2.67) will land in 'disputed' rather than 'ranked'. Back-of-envelope: ~15-25% of currently-ranked hypotheses have ≥1 axis with mean…
council rejected TOOL reversible: simple 3h
Surfaces which v2_a1..v2_a10 axes are pulling weight vs which are pure noise. Architect uses report to retire or reweight low-signal axes in a follow-up proposal.
council rejected PROMPT reversible: simple 1h
Average anchor count per proposal rises from current ~1.1 to >=2.0. Manually-judged 'hallucinated framing' rate (proposals not grounded in the supplied corpus) drops by half on a 50-hypothesis spot check.
filter rejected HARNESS reversible: simple 2h
Empty-batch rate on genesis drops from the current ~8% (per S176 validation traces) to <2%, without inflating overall proposal count by more than 5%.
council rejected GATE reversible: simple 3h
When Corpus T floods (as in S183-S186 forecaster arc), the gate fires on ~15-25% of batches and keeps Corpus E/D candidates from being starved out of argument move. Cross-corpus citation rate in graduated hypotheses rises measurably.
filter rejected AXIS reversible: simple 4h
Hypotheses written without a runnable falsifier (the 847f7e-shape pattern Commander has flagged) drop ~2 points in composite, pushing them below the kept-threshold ~60% of the time. ROBUST candidates retain composite within 0.5 of current.
council rejected AXIS reversible: simple 3h
Re-score the last 19 historical filter_score moves with the new prompt: expect 3-6 to land at 0-1 (e.g. 2026-06-13's calibration proposal admits 'target ~3-4 cycles' — would score 1; 2026-06-12's S151 RECENT_ADMITS proposal needed '30 admissions ~3-6 weeks' — would score 0). Dist…
council rejected GATE reversible: simple 3h
Replay against the last 30 meta proposals (~6 cycles): expect 0-2 to fail the gate (low rate, high signal — those are the ones that would not have implemented as written). Going forward, no council-advanced proposal can reach Architect with a typo'd or stale target path.
council rejected TOOL reversible: simple 5h
Within 10 cycles post-build, ≥3 future proposals reference this tool path in their falsifier text instead of describing ad-hoc replay. Architect implementation work on any gate/axis proposal that already passes council saves ~30 min of duplicated loader code.
council rejected CORPUS reversible: simple 3h
corpus_E_items rises from 3 to ~5-6 with meta-lane-active content. Within 3 cycles post-change, ≥2 proposals will cite a meta-lane E source (e.g. 'kill_reason filter_rejected = N'). Proposals targeting hypothesis_engine paths drop further because Corpus E now visibly shows where …
council rejected PROMPT reversible: simple 2h
On 2026-06-12 cycle 5/5 proposals targeted hypothesis_engine/moves/* paths (council rejected 19/19 — possibly because they target a frozen lane). On 2026-06-13 only 1/5 did. Re-running with the new prompt+gate over the next 3 cycles (~15 proposals), expect ≤2/15 hypothesis_engine…
council rejected TOOL reversible: simple 6h
Builds the dataset needed to falsify the m_a_self_score field's value. With ≥20 verdicted rows accumulated, ρ is interpretable. If ρ > 0.6, self-score can be promoted into the filter_score axes (free pre-LLM signal). If |ρ| < 0.2 over 20 rows, the self-score is noise and the gene…
council rejected GATE reversible: simple 4h
3 of 4 historical Commander overrides were on audit-shaped meta proposals. The gate would have caught the same shape pre-LLM, saving ~$0.013 downstream per gated row + Commander review slot. Replay against last 19 proposals: 1-3 should match (the historical rate).
council rejected HARNESS reversible: simple 5h
Council R1 'consensus at round_1' rate (resolved_at='round_1') drops measurably on the same input set — current historical rate vs replay rate should differ by ≥10 percentage points. R2 escalation rate may rise but should not triple. Net: more proposals get to round 2 debate inst…
council rejected AXIS reversible: simple 3h
Of 19 historical meta proposals at filter stage, at least 1 had an axis the proposer self-scored 0 (proposer self-score data in moves table for meta_genesis). The min-axis floor would route any LLM-judge-issued 0 on the same axis to DROP, producing a measurable filter_rejected de…
council rejected GATE reversible: simple 4h
Of the last 19 meta runs, ≥2 candidates carry target_files identical to a prior council_rejected/commander_killed row (the 4 Commander-override pattern repeats audit/observability shapes across cycles). Re-running dry against historical proposals: ≥2 catches, ≤5 catches (otherwis…
council rejected SKILL reversible: medium 4h
Cross-vendor doctrine S160 says generator+judge must be different families to decorrelate errors; filter_score already decorrelates within a run, but NOT across runs. Expect median IQR across the 5 axes to rise by 0.4-0.8 (currently the 3 runs are too consistent because they use …
council rejected PROMPT reversible: simple 1h
On 20 hypotheses with description > 700 chars, rerun filter_score with the 2500-char prompt and compare composite_rank_score against the historical 700-char score. Expect median absolute delta of 0.5-1.5 on composite; the solo_founder_feasible axis specifically should shift on hy…
council rejected TOOL reversible: simple 4h
On replay against the last 100 admits, expect 3-8 admits to be retroactively rejected as closed-thesis descendants the proposer self-classified as 'clearly_outside' incorrectly. Specifically: any admit whose audience includes 'allocator', 'family office', or 'research publisher' …
council rejected GATE reversible: simple 3h
On replay against the last 100 admits, expect 4-12 admits to be retroactively rejected because their manual_entry_path contains extraction tokens (e.g. 'we auto-extract the claim from the SOW PDF'). Going forward, ~5% of genesis runs that would have admitted will instead reject w…
council rejected HARNESS reversible: medium 6h
On 20 historical hypotheses re-run with the probe, 25-40% of Round-1 judges will produce a different verdict_action on the paraphrased second draw, identifying which 'unanimous 3-0 KILL' or 'unanimous STRONG_BUILD' verdicts are actually fragile. Probe cost: ~+33% of council_verdi…
filter rejected SKILL reversible: simple 3h
Commander override rate (currently 4/N per E corpus) drops by ≥1 per 50 cycles because buyer-voice claim makes weak market-fit obvious before sign-off. Commander can spot 'no buyer would say this' faster than reading full argument.
filter rejected CORPUS reversible: medium 5h
Graduation rate across next 50 cycles increases by ≥15% vs. baseline uniform sampling. Hot seeds (weight >0.5) dominate top-of-funnel without starving exploration (10% cold floor preserves coverage).
filter rejected GATE reversible: simple 5h
Candidates that pass filter but die at argument due to thin evidence drop by ≥30% (estimated from last 14 days of engine.db). Argument move cost-per-graduated-candidate drops proportionally.
filter rejected HARNESS reversible: simple 4h
Cycles aborted due to judge transient failure drop from current ~5-8% (per E corpus) to <1%. Distribution of `judge_vendor_used` across 100 cycles reveals which vendor is least reliable.
council rejected TOOL reversible: simple 3h
On next 30 genesis runs, ≥80% of outputs contain zero AE-internal terms post-rewrite vs. ~30% baseline (sampled from last 7 days of brain/proposals/digest-2026-05-*).
filter rejected GATE reversible: simple 3h
Zero cycles in the next 50 runs where genesis and judge share a vendor family. Detects misconfiguration during model upgrades (e.g., swapping Sonnet→Opus on genesis without rotating judge).
filter rejected AXIS reversible: simple 4h
On the 43 graduated candidates from S157 NBJ sweep, 847f7e-shape proposals (engine-mirror IP) score 0-1 on a11 while ROBUST candidates score 2-3, producing a 1.5-2.5 point composite spread that increases kill rate on jargon-laden proposals by ≥20%.
filter rejected TOOL reversible: simple 5h
Reduces v2_backfill_orphan_S148 kill rate from 7/12 (58%) to ≤2/12 over the next 12 proposals. Saves ~30 min compute per orphan avoided.
council rejected GATE reversible: simple 3h
Across last 295 proposals, ≥60 retroactively gate-kill on this rule; of the 4 Commander overrides, ≥3 fail this gate. Cuts filter_score compute by ~20% on future runs.
filter rejected HARNESS reversible: medium 6h
On 40-item meta hypothesis sample, 4-8 candidates flagged rubric_unstable; flagged set overlaps with ≥2 of the 4 Commander overrides, demonstrating rubric-gaming signal predicts override risk.
filter rejected AXIS reversible: simple 4h
On rerun of 43 graduated candidates, the 4 Commander-overridden candidates (3 KILL + 1 DEFER) score ≤1 on a11 while approved candidates score ≥2, producing 2-3 point composite spread that correlates with override decisions.
council rejected AXIS reversible: simple 4h
Across the historical 40 meta_genesis admissions referenced in the corpus, A6 shadow scores will cluster bimodally (acute vs. vitamin) and should correlate r > 0.4 with council STRONG_BUILD/WEAK_BUILD vs KILL/DEFER actions. If correlation is weak, the rubric is mispurposed or the…
filter rejected GATE reversible: simple 3h
On future S185 batches with the typical 12 theses sampled across [agent…]/[model…] anchors, the gate will reject batches where the broad-anchor theses (5 of 12 in fwd_2026-06-08_h45) collapse onto the same negative-control basket. Expected: 10-30% of attempted locks initially rej…
council rejected SKILL reversible: simple 6h
Some fraction of S185 theses currently 'cite' sources by URL/handle without the source meaningfully supporting the claim (LLM citation hallucination is a known failure mode even on tuned-up generators). Expected: 5-25% of theses rejected on first run; after generator prompt tight…
filter rejected TOOL reversible: simple 5h
After 50+ council runs, if council-vs-baseline agreement is >70% on KILL actions, the council is mostly recapitulating priors and is not adding much signal; <40% suggests genuine discrimination. Either result is informative; current state is unknowable.
filter rejected GATE reversible: simple 3h
On 2026-06-15 the scheduler will warn 'S151 archetype rotation rule expired 2026-06-14'. Currently nothing surfaces this; the doctrine stays in the prompt forever as cargo. After implementation, expired doctrines either get re-ratified by Commander or removed.
filter rejected TOOL reversible: simple 3h
After 50 council_verdict runs, the disagreement-type distribution is non-uniform (≥40% in one type), making it usable as a meta signal. The sonnet_confident_codex_rejected bucket — if nonzero — names the candidates most worth post-mortem review.
council rejected PROMPT reversible: simple 3h
Mean v2_a6 (acute_pain_not_candy) score on freshly-generated candidates rises by ≥0.5 over the next 100 candidates; candy_not_pain kill-reason class (after taxonomy tool above lands) drops below 8% of KILLs.
council rejected HARNESS reversible: simple 4h
Retry rate stabilizes between 10-25% (proves trigger is firing). Among retried candidates, council_verdict accept rate increases by ≥5 absolute points vs. the no-retry baseline measured on the prior 30 days.
filter rejected TOOL reversible: simple 6h
Weekly kill-reason distribution chart becomes computable without manual aggregation; the 'other' bucket stays under 20% of classifications after 14 days, indicating vocab covers observed reality.
filter rejected AXIS reversible: simple 4h
On a backtest over the last 43 graduated candidates, composite spread between top-quartile and bottom-quartile widens by ≥1.5 points; candidates that previously graduated but Commander DEFERed for 'unreachable buyer' score 0-1 on a11.
filter rejected GATE reversible: simple 5h
meta_genesis token cost in the move_cost_rollup drops 15-25% over the next 7d window; KILL-with-reason-off-thesis rate on graduated candidates drops below 10% (currently dominant in Corpus E kill-reason distribution).
council rejected PROMPT reversible: simple 2h
Closed-thesis kills (currently caught only by decage post-filter) drop because most are suppressed at genesis. Genesis volume drops by ≤15% (only the closed-thesis fraction); decage closed_thesis_prune fires <30% of its current rate within 14 days.
council rejected TOOL reversible: simple 5h
Time-to-diagnose any kill drops from 'reconstruct context manually from engine.db' (~20 min) to 'open one JSON file' (<2 min). The 7× v2_backfill_orphan_S148 cluster becomes diagnosable in one session.
filter rejected HARNESS reversible: simple 4h
Within 50 disagreement rows, a calibration pattern is visible: e.g. 'Codex scores a3_reachability systematically 1 point lower than Sonnet on B2B-shape candidates'. This pattern enables targeted judge-prompt tuning.
filter rejected AXIS reversible: simple 4h
Hypotheses authored from a single internal hunch with no external grounding lose ~2 points of composite (drop one axis from 3 to 0). MANIFESTO-v4-aligned evidence-grounded proposals widen their composite lead over speculative ones by 1.5–2.5 points.
filter rejected SKILL reversible: simple 6h
Of the 2× fatal_objection_both_confirm kills/week, ~60% are caught at self_attack instead, freeing council budget. Council budget per surviving hypothesis rises measurably (>15%). Self_attack defensibility_score correlates positively (r > 0.4) with council survival.
filter rejected GATE reversible: simple 3h
The 7× v2_backfill_orphan_S148 kills currently consuming filter_score budget shift to a zero-cost pregate KILL; filter_score spend on orphan candidates drops to ~0; downstream kill_reason distribution loses the v2_backfill_orphan_S148 bucket within filter_score post-mortems.
filter rejected TOOL reversible: simple 4h
Over 7 cycles, ~10-20% of proposals get reversibility downgraded by the validator (LLM self-reports SIMPLE more often than warranted). Graduated COMPLEX-reversibility proposal share drops from current visible rate to <=5%.
filter rejected PROMPT reversible: simple 3h
Evidence-field population rises from current ~30% (estimated from sample) to >=95% over next 3 cycles. Downstream composite_v2 scores correlate more tightly with evidence-row count (Pearson >=0.3) because grounded proposals describe more concrete falsifiers.
filter rejected GATE reversible: simple 3h
Over 7 cycles, ~5-15% of proposals get hard-rejected as oversized (currently they reach council and waste argument tokens). Graduated proposals' median solo_time_estimate drops from current ~6h to ~4h.
filter rejected CORPUS reversible: simple 5h
Over the next 7 cycles, proposals citing source_corpus='commander_override' rows in their evidence field appear in >=20% of generated candidates, AND commander_override rate on graduated proposals drops from current 4/21 (~19%) to <=10% — i.e. the engine learns from its disagreem…
filter rejected HARNESS reversible: simple 6h
Over a 7-cycle window, ~15-25% of proposals trigger retry; of retried proposals, >=40% have changed verdict (judge confirms or reverses with cited reason). Net effect: graduated-proposal APPROVE precision (post-deploy non-revert rate) rises by >=10 points vs pre-harness baseline.
council rejected GATE reversible: simple 5h
On the next 7 cycles, duplicate_with_prior kills should account for 10-30% of pre-filter rejects (currently invisible because dupes pass filter_score and burn argument tokens). Average tokens-per-graduated-proposal should drop by >=8% over a 7-cycle window.
filter rejected AXIS reversible: simple 4h
On the 21 currently-graduated meta proposals, the 847f7e-shape (vague-clarity) proposals score 0-1 while concrete-diff proposals (e.g. the S179 handoff-contract style) score 2-3, producing a 2-3 point spread in composite_v2 that correlates with downstream council_verdict APPROVE …
filter rejected GATE reversible: simple 6h
Of the next 50 genesis proposals, the gate fires on 2-5 (4-10%) that previously would have slipped past the vocabulary trigger. Manually inspecting the matched archived hypothesis for each fired proposal confirms a true archetype match in ≥70% of fires (precision check on a 5-fir…
council rejected PROMPT reversible: simple 3h
Of the next 30 genesis proposals, ≥40% will be rejected at parse-time for missing/insufficient buyer_check. Surviving proposals will have a structurally testable validation step the Commander can run within 14 days. Re-running council_verdict on 10 previously-graduated candidates…
filter rejected HARNESS reversible: simple 4h
On the next 20 hypotheses to run filter_score, the run_total standard deviation across the 3 runs of each hypothesis decreases by ≥30% vs. the previous 20-hypothesis baseline measured from `engine.db` move history. Composite IQR penalty (-0.5×IQR) is materially smaller for hypoth…
filter rejected AXIS reversible: simple 5h
Among graduated candidates whose data_moat scored 2-3, those whose moat language is hand-waved (no named trajectory source) score 0-1 on the new axis, producing a 2-3 point composite spread that re-ranks vague-moat hypotheses below specifically-sourced ones. Forecaster-v1-class h…
filter rejected GATE reversible: simple 4h
Re-running ingest over the last 30 days of digest items will reject digest-2026-06-04-007 (Air Canada Jan 2026 1,247 passengers — flagged by Gemini as hallucinated) and at least 1-2 other items where Gemini's review caught fabricated specifics. T-corpus items reaching genesis wit…
filter rejected PROMPT reversible: simple 1h
Resolver-clause presence rate in raw genesis output rises from current baseline (~30-50% per S184 record) to ≥85% within first 50 post-change generations. Pairs with the gate proposal — prompt change reduces gate rejection rate.
filter rejected HARNESS reversible: simple 2h
After 14 days, the log enables a one-liner query that would have surfaced the observed 'audit-shaped' cluster (4 overrides clustered on audit-shaped products per S183 traces) in real time. Enables downstream axis-weight tuning informed by override patterns.
council rejected TOOL reversible: simple 3h
Run against current engine.db (which contains the v2_backfill_orphan_S148 cluster of 7) produces exactly one warning for that reason. Future kill-clusters surface within 24h of crossing threshold rather than at retro-review.
filter rejected AXIS reversible: simple 3h
On the S183 graduated-candidate set (n=43), pre-pivot speculation-shaped hypotheses score 0-1 (mean ≈ 0.7); post-pivot forecaster-shaped hypotheses score 2-3 (mean ≈ 2.4). Composite-score spread of ≥1.5 points between cohorts.
filter rejected GATE reversible: simple 4h
Per S184 forecaster v1 calibration record, ~40-55% of candidate hypotheses graded 'unscoreable - no resolver' by the forecaster grader. Gating at genesis should drop unscoreable rate at grader from baseline to ≤10% while adding ≤5% latency.
filter rejected SKILL reversible: simple 7h
Commander decision latency on pending_commander queue drops (faster context recall). Commander-override consistency increases: shapes previously rejected get rejected again at higher rate (>70% match on similar-shape pairs).
council rejected TOOL reversible: simple 6h
Every kept candidate has a product_shape label with confidence ≥0.5. Missing-shape rate <5% over a 100-candidate sample. Downstream filter/ranking moves can read product_shape rather than re-deriving from text.
filter rejected PROMPT reversible: simple 2h
Post-filter resolution_gate rejection rate drops from current baseline (the resolution_gate currently kills candidates whose hypothesis is unresolvable). Council calls on resolvability-deficient candidates drop in lockstep.
filter rejected HARNESS reversible: simple 4h
Argument-move stuck-pending count drops to ~0. p95 wall-clock for argument completion bounded by 180s (90s × 2). No silent hangs requiring manual SQL intervention.
filter rejected GATE reversible: simple 5h
Candidates retreading closed theses are rejected ~30s earlier in the pipeline, avoiding ~3 LLM calls each (filter_score + argument + verdict). Estimated 8-15% of current filter_rejected rows will be reclassified as closed_thesis_gate rejections. Daily LLM spend on rejected candid…
council rejected AXIS reversible: simple 4h
Forecaster-shaped candidates (per S183 pivot) score 2-3 on A11; legacy product-shape candidates (e.g. SENTINEL-retread, 847f7e-shape) score 0-1. Composite-score spread between shapes widens by 2-3 points, making the forecaster lane separable at the filter stage rather than at cou…
council rejected PROMPT reversible: simple 2h
Genesis output volume drops 15-25% (self-rejection in the chain-of-thought, candidate not emitted). Among remaining candidates, the fraction tagged NO_PRE_MORTEM_FAILURE is <10%. Filter pass-through rate on candidates with substantive pre-mortems rises, because the lowest-quality…
filter rejected HARNESS reversible: simple 3h
Current silent-drop rate (visible as 'filter never ran' candidates in engine.db) drops to 0. The 'unscored_timeout' bucket surfaces a new Commander review queue. If pre-harness silent-drop rate is X%, post-harness retry-success rate captures roughly X * (1-truncation-loss) of tho…
filter rejected TOOL reversible: simple 5h
After classifying the last 200 overrides, shape-rejection + fluency-trap combined account for >40% of overrides. This validates whether the fluency-trap gate proposal is targeting the dominant failure mode, or whether 'missing-context' / 'wrong-vertical' would be a higher-yield i…
filter rejected AXIS reversible: simple 4h
Over the next 100 filter-scored candidates, ~15-25% score 2-3 on v2_a11. Candidates scoring 3 on v2_a11 enter argument round at >2x baseline rate. Commander-approval rate among high-disagreement (a11=3) candidates differs from baseline by at least 1.5x in either direction.
filter rejected GATE reversible: simple 6h
On retro pass against the last 30 graduated candidates, ~20-30% trip the gate (matching the manifesto-v4 deflation rate). On forward pass, graduation volume drops 20-30% with the dropped candidates concentrated in shape-rejection / fluency-trap kill_reason buckets.
filter rejected HARNESS reversible: simple 4h
Every active non-killed hypothesis after first council pass will have 1-3 kill tests surfaced in the brief Commander reads. Over a 14-day window, the brief becomes more actionable: Commander can scan kill tests and run the cheapest one rather than re-reading 9 deep reports.
council rejected HARNESS reversible: simple 4h
Hypotheses scored fresh (signals < 30d) see composite identical to today. Hypotheses that have been languishing in GATHER_MORE_SIGNAL for >90 days see their composite drop by 0.3-0.9 points and may fall out of the top-N graduation bucket. Stalled hypotheses naturally demote thems…
filter rejected TOOL reversible: simple 5h
Recorded signal_a_url fields across the existing hypothesis corpus will fail the validator at a non-zero rate (>5%). New runs will see fewer Signal A claims and a proportional increase in Signal D — graduation will become slightly harder for hypotheses that previously relied on n…
council rejected AXIS reversible: simple 4h
Hypotheses where Opus(high) and Gemini(low) disagree by >=2 per axis on average will see composite drop by 1.5-3.0 points. Hypotheses where the two judges align tightly gain up to 1.5. Expected rank inversions on ~20-30% of currently-ranked hypotheses.
council rejected GATE reversible: simple 3h
Re-running the proposer with the S128 trigger lexicon present in 50 candidates should produce ~10-20 admissions blocked at the new gate (today they slip through to dedup/council). Per-genesis-cycle cost drops by the dedup+council spend on blocked candidates (~$0.02-0.05 per save)…
filter rejected GATE reversible: simple 3h
Catches silent same-vendor judging cases that violate the S158 cross-vendor principle. Over 30 cycles, expect 0-3 enforcement-triggered rejudges (low because doctrine is mostly followed); rate >10% means routing config has drifted and is the real fix.
filter rejected CORPUS reversible: simple 7h
Filter-kept rate of genesis output rises 10-25% within 30 cycles, because dead synthetic items stop being retrieved at uniform rate. Top-quartile synthetic items get retrieved 2-3x more often.
council rejected TOOL reversible: simple 6h
Cuts filter-kept rate of 847f7e-shape proposals by ≥50% relative to current baseline, without adding LLM cost. Over 30 cycles, ≥80% of detector-flagged items are also human-rated as empty.
filter rejected HARNESS reversible: simple 5h
Eliminates the silent 5-cycle genesis outage class observed in S180. Over 30 cycles, ≥98% of genesis attempts produce output. Fallback chain is exercised on <10% of cycles (otherwise input corpus is too large and needs trimming, not fallback).
filter rejected GATE reversible: simple 6h
Catches drift cases where argument constructs a sharp falsifier and council silently substitutes a softer one. Over 30 cycles, expect 5-15% of currently-approved proposals to be rejected for falsifier_drift, surfacing a class of error that pending_commander review currently absor…
filter rejected AXIS reversible: simple 4h
On the 43 historically-graduated candidates, 847f7e-shape proposals (empty noun phrases) score 0-1 on a11 while ROBUST candidates score 2-3, producing a 2-3 point spread on the composite. Filter-kept rate of 847f7e-shape drops ≥30%.
filter rejected AXIS reversible: simple 4h
Proposals grounded only in stale Corpus T (e.g. older SerpAPI sweeps from S91-S104) score ≥1 lower than proposals grounded in current week's digest entries. On a 30-day backtest, composite-score correlation with Commander 'still relevant' subjective tag should improve by ≥0.15 Pe…
council rejected HARNESS reversible: simple 6h
Genesis-induced cycle failures (silent outage class — see fix in commit 0f2d20d for Bedrock Opus 4.6) drop to zero over 60 days. Downstream filter_score.js never receives a malformed proposal in production. Retry rate sits between 5-15% (signal that the validator is firing) but f…
filter rejected GATE reversible: simple 4h
The 2 fatal_objection_both_confirm kills in the current trace would have skipped council, saving $0.20 across those cycles. Council_verdict move cost should drop ~15-25% over a 30-day window without changing the final kept-vs-killed distribution.
filter rejected AXIS reversible: simple 8h
Across the next 20 cycles, candidates that the engine would have produced and council-passed but Commander would have killed should score ≤1 on v2_a11 at least 60% of the time. Composite score spread between Commander-KILL-shape candidates and Commander-pass-shape candidates ≥ 1.…
filter rejected GATE reversible: simple 5h
On the 5-day Phase 1 trace (7× v2_backfill_orphan + 1× structural_duplicate_15ed71), at least 2-3 of those kills migrate from post-argument to pre-argument, saving ~$0.20-0.33 in argument cost per Phase 1 week. Argument move cost drops measurably in the next cycle telemetry.
council rejected TOOL reversible: simple 2h
Report reveals exact count of live orphaned rows (expected >0 given 7 kills in 7-day window with kill_reason='v2_backfill_orphan_S148'). Provides move_waste count per orphan to quantify GATE proposal value before implementation. If orphan_count=0, the GATE proposal (P1) is moot —…
filter rejected PROMPT reversible: simple 3h
When two proposals both target the same file and modify overlapping function blocks, the judge flags composability=major on at least one. Across 30 meta_filter_score runs (current observed volume), this surfaces implementation ordering conflicts before they reach Commander approv…
filter rejected PROMPT reversible: simple 2h
Genesis output rate for GW01-shaped proposals drops to <10% (estimated baseline ~23%, derived from 4 Commander kills against ~17 non-orphan proposals in the 7-day kill window). This is the earliest intervention point — preventing generation rather than catching proposals post-hoc…
filter rejected AXIS reversible: simple 4h
The 4 Commander-killed hypotheses (a38d31 = AI Control Failure Forecast Audit, e9cb5c = Reality-Graded Upgrade Gate, c89a71 = ClaimGate for B2B SaaS, 6bf9c5 = AI Tool Claim Verification) all score 0 on v2_a11. Graduated candidates score 2-3. The axis adds a 2-3 point composite sp…
rejected GATE reversible: simple 4h
The 7 v2_backfill_orphan_S148 kills (58% of all kills in 7-day window) are pre-empted before any moves are dispatched. Each orphaned row currently burns evidence_search, red_team_kill, and steelman moves before the kill fires — at the observed move:kill ratio (~360 moves / 12 kil…
filter rejected PROMPT reversible: simple 3h
With genesisRunCount ≈33 (33 days × ~1 run/day since 2026-04-19), S112 block is absent from PROPOSER_SYSTEM on next run. PROPOSER_SYSTEM character count drops by approximately the S112 block length (~900 chars, lines 88-106). Over the next 10 genesis runs without S112: if post-ho…
filter rejected CORPUS reversible: simple 4h
The next meta genesis run after deploy will include the 3 recent Commander KILLs (ClaimGate, AI Tool Claim Verification, AI Control Failure) and 1 DEFER (Upgrade Gate) as explicit Corpus E items. At least 1 proposal in that run should cite 'commander_override' in its evidence fie…
council rejected PROMPT reversible: simple 2h
Genesis (Anthropic/Sonnet 4.6) and the optimist reviewer (now OpenAI/Codex) are different vendors. On re-scoring 10 recent hypotheses with both versions, mean high-low spread per axis should widen by ≥0.15 across at least 3 of 5 axes, indicating the reviewers are less correlated.…
filter rejected TOOL reversible: simple 5h
Running node hypothesis_engine/scripts/backtest_filter_axes.js --axis=v2_a11 --labels=s157_labels.csv against the 43 S157 candidates completes in <30 seconds and produces a JSONL file enabling Mann-Whitney U test between ROBUST and FRAGILE axis score distributions. This replaces …
council rejected PROMPT reversible: simple 2h
Within 20 genesis runs post-deployment, ≥50% of generated hypotheses will contain the phrase 'exception class' or 'route to' or 'does not handle automatically' in their description field. Baseline rate is 2.3% (1/43 in S157 NBJ sweep). A grep over genesis output logs is sufficien…
filter rejected GATE reversible: simple 6h
Flag rate in first 30 days should fall between 5-15% of argument runs. Flagged hypotheses should show lower graduation rates than unflagged over a 60-day observation window. If the commander-killed AI control / ClaimGate / AI Tool Verification for Agencies hypotheses are run thro…
filter rejected GATE reversible: simple 3h
After 30 days, meta_engine/data/non_convergent/ will contain a corpus of ESCALATED transcripts. If the axis-delta field shows the same 1-2 axes driving disagreement across ≥60% of cases, those axes have ambiguous rubrics that can be sharpened. If zero files appear despite ESCALAT…
filter rejected AXIS reversible: simple 4h
The 3 commander-KILLed proposals in the last 7 days (AI control, ClaimGate, AI Tool Verification for Agencies) would score 0 on this axis due to entrenched audit/verification incumbents with regulatory switching costs. Retrospective on 43 S157 NBJ candidates should show statistic…
filter rejected GATE reversible: simple 6h
After 60 days of shadow logging, score ≥3 correlates with ≥70% eventual KILL verdicts. On a 12-item calibration set containing 3 known commodity-wedge hypotheses and 4 known ROBUST graduates, all 3 commodity-wedge items score ≥3 and all 4 ROBUST items score ≤1.
council rejected PROMPT reversible: simple 4h
Re-running escalated hypotheses 7199a9 and 2ca131 through the updated prompt each produces ≥2 distinct, non-overlapping testable questions naming specific observables (e.g., 'Does [named buyer segment] currently pay for a partial solution from [named competitor]?' rather than 'Is…
filter rejected AXIS reversible: simple 5h
d3786b-shaped hypotheses (cold outbound dependency, no owned channel) score v2_a7=0. ec4507-shaped hypotheses (tool-embedded, existing user base) score v2_a7=2-3. A 4-candidate test set spanning the distribution reachability spectrum produces a score range of ≥3 points on this ax…
filter rejected AXIS reversible: simple 5h
Of the 7 hypotheses killed in the last 7 days for 'wrong distribution shape or pain framing,' at least 5 score v2_a6 ≤1. ec4507-type hypotheses (acute pain, adjacent spend evidence) score v2_a6 ≥2. The pre-council kill rate for candy-shaped hypotheses increases measurably, reduci…
filter rejected TOOL reversible: simple 3h
Every future council escalation produces a persisted JSONL record. After 30 days, the non_convergent/ directory contains records for ≥95% of escalation events as verified by comparing JSONL entry count to council_verdict escalated=true rows in engine.db.
filter rejected HARNESS reversible: medium 8h
5 of 9 recent council verdicts (55%) contain explicit temporal deferral conditions ('run Week 1 outbound before building', '7-day artifact-upload test', '7-day signal check'). These hypotheses currently sit in GATHER_MORE_SIGNAL indefinitely with no automatic re-evaluation. After…
filter rejected GATE reversible: simple 6h
The 'RevOps Objection Taxonomy Normalizer' shape (GPT-5.5-Pro Round 1: passes describability, observed-buyer, solo-inbound, yet still structurally weak on urgency and data advantage) flags commodity_wedge=true on axes 3+4+5. After 4 weeks of shadow: hypotheses where commodity_wed…
council rejected GATE reversible: simple 8h
Codex retrospective on 43 S157-scored candidates: gate kills hyp-2026-05-06-847f7e (S157 score 0/5, structurally fragile on all 5 Q dimensions) and does not kill any of the three 5/5 ROBUST candidates (ec4507, 24a849, 3656a0). Spearman rank correlation between gate composite_scor…
council rejected TOOL reversible: simple 4h
Current non-convergence rate: 22% (2 of 9 recent verdicts = 'council could not converge after 3 rounds'). At 9 verdicts/week, 30 days produces ~8–10 transcripts. The escalationReason field already distinguishes FACTUAL vs WEIGHTING vs FRAMING disagreements. The corpus enables the…
council rejected ARCHITECTURE reversible: medium 12h
The 4 Commander-killed hypotheses (a38d31, c89a71, 6bf9c5, e9cb5c) should score ≤18/30 on v2 composite due to failing scalable_revenue (audit-shaped), distribution_reachability (no warm-contact base noted in override reason), and commander_non_engine_work_fit. The v1 composite sc…
filter rejected PROMPT reversible: simple 4h
Every GATHER_MORE_SIGNAL verdict (currently ~55% of verdicts per S157 distribution: 25 ROBUST + 13 MIXED + 4 FRAGILE + 1 STRUCTURALLY FRAGILE out of 43) produces a machine-readable gate condition that Commander can execute without re-reading the full reasoning. The 2 non-converge…
council rejected PROMPT reversible: simple 3h
Genesis outputs for commodity or evergreen problems (e.g. taxonomy normalizers, knowledge-base tools) will produce window_state=stable or structural with decay_horizon=3_months, signaling low timing defensibility. Proposals tied to genuine substrate shifts (agent-era trust gaps, …
council rejected PROMPT reversible: simple 2h
Argument transcripts will contain named companies in attacker rounds, enabling council to distinguish theoretical objections from documented failure patterns. The RevOps taxonomy shape (S158 Round 1 survivor shape that passes all describability/reachability checks) should generat…
council rejected PROMPT reversible: simple 3h
The 5 recent council verdicts (5d7cca, 26fc18, cc72cd, 90778c, c27754) each independently invented ad-hoc week-1 tests. Post-change, genesis outputs carry those tests, so council can evaluate their credibility rather than invent them. 10 consecutive genesis outputs should contain…
filter rejected PROMPT reversible: medium 8h
Hypotheses like Commander-KILLED a38d31 (audit product, no warm-contact base) and c89a71 (ClaimGate, relational sales) should score A5=0 (scalable_revenue: pure audit service) and A7=0–1 (distribution_reachability: warm intros required), producing v2 composites below 40% even if …
council rejected GATE reversible: simple 6h
RevOps Objection Taxonomy Normalizer shape (CRM-integrated taxonomy, no urgency event, dashboard deliverable) flags commodity_wedge_recommendation=true. hyp-2026-05-06-ec4507 (Support Escalation: SLA deadline forcing function, Zendesk timestamp as external ground truth, not CRM-d…
council rejected PROMPT reversible: simple 2h
hyp-2026-05-06-847f7e (Support Promise Calibration Console — killed because 'CSAT/SLA outcomes are multi-causal') scores 0-1 on fast_feedback_loops under the revised rubric. hyp-2026-05-13-47730e (AI Portfolio Claim Auditor — killed because 'board verdicts multi-causal') scores 0…
council rejected PROMPT reversible: simple 2h
After 20 genesis runs post-patch, at least 30% of hypothesis descriptions contain explicit exception-class language ('The workflow excludes...', 'Human review is required when...', or equivalent). Current S157 baseline: 1/43 graduated candidates (2.3%) explicitly named exception …
filter rejected AXIS reversible: simple 5h
RevOps Objection Taxonomy Normalizer shape (taxonomy/analytics, CRM-integrated, no named urgency event) scores 0-1. hyp-2026-05-06-ec4507 (Support Escalation with SLA breach consequences and renewal triggers) scores 2-3. Retrospective application to 43 S157-scored candidates show…
filter rejected TOOL reversible: simple 3h
After 30 days of active council cycles (current rate ~9/week, empirical non-convergence rate 22%), directory accumulates 8-12 non-convergent transcripts. This corpus enables first structured analysis of split-reason taxonomy: whether splits cluster by hypothesis type, ICP, or mod…
deferred GATE reversible: simple 5h
Of 9 recent deep_council_verdict runs, at least 3 of the 5 killed hypotheses would be caught here: d3786b (Agronomy Advisory — no observable paying buyer for AI-powered agronomy ledgers), c27754 (Medical-Device SME buying AI components — buyer leverage unverified), cc72cd (Bot-Pr…
deferred PROMPT reversible: simple 3h
After 2 genesis cycles (~62 new hypotheses at current 31/week throughput), at least 55% of generated proposals include a non-empty exception_classes field with ≥2 distinct named situations (not paraphrases). The companion exception_class_named axis (Proposal 1) will show mean sco…
deferred AXIS reversible: simple 5h
Applied retroactively to 43 S157-graduated candidates: hyp-2026-04-19-3656a0 (ec4507, cited explicit scope exclusions) scores 2-3; hyp-2026-05-06-847f7e (zero exception classes anywhere) scores 0; at least 38 of 43 score 0-1, producing a minimum 2-point composite spread between R…