Meta engine — engine improving itself

A third lane that does not propose products. It proposes changes to the engine that proposes products. Each candidate is a concrete, falsifiable, solo-feasible modification — a new filter axis, a corpus tweak, a prompt revision, a tool, a harness. Same evaluation discipline as the other lanes, applied inward.

Accepted 5

Most recent 16 Jun 2026

Add v2_a12 forward_clock_testability axis

16 Jun 2026proposed

approved AXIS reversible: simple 5h

On rerun of 43 graduated candidates, fewer than 8 score 3 (most candidates lack testable horizons); the 1 DEFER override scores ≤1. Composite spread of 2-3 points between testable and non-testable cohorts.

Add 'evidence' to meta_engine genesis validateProposal required fields and structure-check it

23 May 2026proposed

approved TOOL reversible: simple 1h

Proposals with no evidence array, empty evidence, or evidence items missing source_corpus/source will be rejected at validation time with a specific reason rather than silently persisting with evidence=[]. Retrocheck: `SELECT id, title FROM hypotheses WHERE lane='meta' AND json_e…

Fix meta_engine genesis validator: cap solo_time_estimate at 16h not 24h

23 May 2026proposed

approved TOOL reversible: simple 1h

Proposals with solo_time_estimate in the 17-24h range are now caught by validateProposal() and routed to rejected[] instead of persisting to the DB. The enforcement gap between the system prompt contract (16h) and the validator implementation (24h) is closed. Historical retrochec…

Add NBJ 5-question describability pre-check at start of argument.js

18 May 2026proposed

accepted with revision shadow mode GATE reversible: simple 6h

Applied to 43 S157 graduated candidates: hyp-2026-05-06-847f7e (0/5 on S157 manual review) is killed before argument; none of the 25 ROBUST candidates (4-5/5) are killed. Per the move cost rollup, argument + council_verdict + 7 deep moves average approximately $0.12-0.18 per hypo…

Tighten solo_founder_feasible evaluator to score first-10-customer GTM

18 May 2026proposed

accepted with revision PROMPT reversible: simple 2h

Back-scoring hyp-2026-05-14-d3786b (Agronomy Advisory for UK soft-fruit and glasshouse growers — institutional trade-channel buyers) and hyp-2026-05-11-cc72cd (Bot-Promise Slip for B2B Support Ops — enterprise procurement buyers) with revised prompt produces solo_founder_feasible…

Awaiting Commander decision 4

Most recent 23 Jun 2026

Add prior-art dedup gate before meta_filter_score LLM call

23 Jun 2026proposed

awaiting decision GATE reversible: simple 4h

10-25% of incoming meta proposals short-circuit to DROP without LLM call, cutting meta_filter LLM cost proportionally and freeing reviewer attention for novel proposals. Distinct-change_type ratio per cycle rises.

Add schema-drift detector script comparing schema.sql to live db columns

23 Jun 2026proposed

awaiting decision TOOL reversible: simple 4h

First run reports the known drift (lane, meta_ship_status, pool_status, verdict_*) plus any other we have not noticed. Subsequent runs after schema.sql is reconciled report zero drift.

Add non_convergence_capture sink: log proposals filtered at every stage

21 Jun 2026proposed

awaiting decision TOOL reversible: simple 3h

After 7 days, capture rate (proposals-rejected / proposals-generated) is measurable per axis and per stage. Reveals whether v2_a3 'solo_feasibility' carries >50% of rejections (suspected over-weight per S158 Round 2).

Add v2_a11 describability axis (one-sentence buyer-language test)

18 Jun 2026proposed

awaiting decision AXIS reversible: simple 4h

Hypotheses whose title leads with AE's mechanism (e.g. forecast-audit / claim-gate / drift-ledger pattern — the 847f7e shape) score 0-1 on v2_a11; hypotheses whose title leads with the buyer's pain ('Dental practice software complaints triage') score 2-3. Predicted 1.5-2.5 point …

Deferred / rejected 172

Most recent 24 Jun 2026

Add per-vertical genesis cooldown harness

24 Jun 2026proposed

council rejected HARNESS reversible: simple 6h

After 1 week of operation, vertical distribution Gini coefficient across emitted candidates drops by >=0.15 vs prior 7d baseline. Cooldown fires on at least one vertical. Overall genesis throughput drops by no more than 20%.

Inject 4 commander overrides as few-shot exemplars in council_verdict

24 Jun 2026proposed

council rejected PROMPT reversible: simple 2h

On a replay of council_verdict against the 4 historical override candidates, council's verdict flips to match commander on >=3/4 cases. On a 30-candidate forward sample, council PASS rate on audit-shaped candidates drops by >=30% vs the 7-day baseline.

Add v2_a11 cross-engine convergence axis to filter_score

24 Jun 2026proposed

filter rejected AXIS reversible: simple 8h

Among the 43 graduated candidates, median v2_a11 score is >=1; among killed candidates, median is 0. Composite score spread between graduated and killed widens by >=0.5 points after adding the axis. At least 1 historically-killed candidate would have passed filter with the new ax…

Add genesis-time classifier predicting v2_backfill_orphan_S148 kills

24 Jun 2026proposed

filter rejected TOOL reversible: simple 6h

Over the next 100 genesis events, >=4 candidates are killed early as predicted orphans (matching the historical 7%-of-kills base rate). Leave-one-out accuracy on the 7 historical orphan kills is >=5/7. Cumulative move-cost saving over 100 events: ~one filter_score + one argument …

Add pre-argument gate to kill audit/scorecard-shaped candidates

24 Jun 2026proposed

filter rejected GATE reversible: simple 4h

Re-running the gate on the 3 commander-KILL'd audit-shaped candidates (a38d31, c89a71, 6bf9c5) fires on all 3. Re-running on a control set of 10 graduated non-audit candidates fires on 0-1 of them. Net effect on next 100 candidates: 5-15% killed pre-argument, saving argument cost…

Tighten evidence schema in meta_engine genesis system prompt

23 Jun 2026proposed

council rejected PROMPT reversible: simple 2h

Among the next 60 meta_filter_score DROP reasons, mentions of 'invented evidence' / 'evidence not citable' / 'cargo-cult citation' decrease by >=50%. Proposer self-kills (rejected[] array length) increase modestly. Net KEEP rate may not move, but downstream judge confidence impro…

Add Gemini as concurrent second judge to meta_filter_score (require both KEEP)

23 Jun 2026proposed

council rejected HARNESS reversible: simple 3h

KEEP rate drops 15-40% vs single-judge baseline. Council-stage and Commander-stage rejection rate on meta proposals that survived filter drops measurably, because surviving proposals cleared two independent vendor families.

Add one-shot JSON-repair retry to filter_score on parse_error

23 Jun 2026proposed

filter rejected HARNESS reversible: simple 3h

Across the next 50 filter_score invocations, parse_error rate drops by >=60% and the corresponding cost-without-phase-advance burn drops proportionally. Repair calls cost ~10-20% of an original call.

Corpus rotation freshness gate before genesis

22 Jun 2026proposed

filter rejected GATE reversible: simple 3h

The S198-class monoculture-from-frozen-queries failure mode (12 frozen dev-infra queries for many cycles) becomes impossible to enter silently. Either rotation happens, or the Commander explicitly authorizes the stale run with a logged reason.

Council verdict normalizer + ambiguity rejector

22 Jun 2026proposed

council rejected TOOL reversible: simple 4h

S204-class 'Gemini=REJECT-truncated' verdicts no longer count as REJECT in quorum math. Quorum either resolves cleanly (after re-run) or pauses for Commander, with audit trail showing which models gave AMBIGUOUS.

Lane-assertion harness around scheduler dispatch

22 Jun 2026proposed

filter rejected HARNESS reversible: simple 4h

A B5-class regression (scheduler running engine_1 moves on meta-lane rows) is caught on first violation instead of accumulating 97 wasted moves. Lane-mismatch row count visible in daily cycle summary.

Add SHARED_PROXY pre-calibration gate

22 Jun 2026proposed

council rejected GATE reversible: simple 5h

S204-class calibrations that fold S201 proxy results into both training and evaluation sides hard-fail at gate-time instead of producing a 'validated' instrument that re-encodes the original bias.

Add v2_a11 source_family_independence axis to filter_score

22 Jun 2026proposed

filter rejected AXIS reversible: simple 6h

Proposals leaning solely on PUBLIC text-visible sources (the Codex #7 shared-bias class) score 0-1; cross-family proposals score 2-3, producing 1-2 point composite spread that correlates with Gate 1 PROCEED outcomes on the S204 instrument.

Add costly_behaviour_corpus source: weekly Upwork+job-posting scrape

21 Jun 2026proposed

council rejected CORPUS reversible: medium 10h

Within 2 weeks, ≥30% of meta-lane filter_kept proposals cite at least one costly_behaviour row in their argument move (current rate: ~0% per S203 baseline). Solo_feasibility false-positives drop ≥20%.

Add cross-vendor judge harness to council_verdict (Anthropic gen, OpenAI judge)

21 Jun 2026proposed

filter rejected HARNESS reversible: simple 5h

On replay of last 30 council verdicts, panels lacking cross-vendor judge get flagged; commander override rate on cross-vendor-judged subset is ≥30% lower than same-vendor subset.

Add confidence-weighted aggregation in council_verdict (RLCR-style)

21 Jun 2026proposed

filter rejected PROMPT reversible: simple 3h

On last 30 council verdicts, ≥4 flip from KEEP→KILL (low-confidence dissenter outweighs high-confidence majority) or KILL→KEEP. Commander override rate drops by ≥25% on the flipped subset.

Add evidence_provenance gate: every claim cites corpus row_id or rejects

21 Jun 2026proposed

filter rejected GATE reversible: medium 8h

Solo-founder-feasible KD-threshold hallucination class (documented in S160) disappears from meta-lane output. Manual audit of 20 fresh outputs shows 0 unsourced empirical claims vs current ~30% rate.

Add diversity_guard gate: reject proposals with ≥0.85 cosine overlap to last 30 kept

21 Jun 2026proposed

filter rejected GATE reversible: simple 6h

On meta-lane backlog of last 7d, ≥15% of currently-kept proposals get re-classified as diversity_rejected; commander KILL/DEFER rate on remaining drops by ≥30%.

Add v2_a11 describability axis scored via NBJ 5-question test

21 Jun 2026proposed

council rejected AXIS reversible: simple 4h

On 43 graduated candidates from S157, 847f7e-shape proposals (the 14 NBJ-failed ones) score ≤1 on a11; ROBUST candidates score 2-3. Composite spread widens 2-3 points and matches S157 sweep classification.

Add codex-judge harness for borderline council_verdict

20 Jun 2026proposed

filter rejected HARNESS reversible: simple 6h

Borderline (within 0.5) decisions become decorrelated from the Sonnet judge that produced the composite. Expect the codex tiebreaker to disagree with sonnet on 30-50% of borderlines (validating the decorrelation hypothesis from S202 design hygiene).

Add kill_reason_classifier tool feeding back to genesis

20 Jun 2026proposed

council rejected TOOL reversible: medium 8h

Genesis produces fewer proposals in the over-represented kill buckets. Expect top-2 kill buckets to shrink by 25-40% in subsequent cycles; bottom buckets to grow (zero-sum redistribution, not net improvement on first deployment).

Add 'status-quo enumeration' step to genesis prompt

20 Jun 2026proposed

council rejected PROMPT reversible: simple 3h

Genesis output will carry a structured status_quo field on ≥95% of proposals after change; v2_a12 scoring will be deterministic against this field rather than NLP-inferred. Expect fewer 'invented problem' proposals (estimated 20-30% reduction in council-stage 'no buyer would pay'…

Add pre-filter source-diversity gate (G_DIV)

20 Jun 2026proposed

filter rejected GATE reversible: simple 6h

S200 cross-memo dedup runs post-genesis; G_DIV catches the upstream skew before any filter compute is spent. Expect ~10-15% of cycles to trigger re-draw initially, dropping to <5% as corpus rotation matures.

Add v2_a12 status_quo_displacement axis to filter_score

20 Jun 2026proposed

council rejected AXIS reversible: simple 4h

Proposals from genesis that skip displacement framing will score 0-1 and lose 1-2 composite points. Expect graduation rate of 'greenfield-only' proposals to drop while displacement-framed proposals rise.

Add v2_a11 market_stage_signal axis to filter_score

20 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

Breadth-first proposals (per S202 ratification) that omit market-stage will score 0-1 on a11, dropping their composite by ~1-2 points and reducing pass-through of stage-blind candidates by an estimated 15-25%.

Add per_corpus_yield telemetry tool

19 Jun 2026proposed

council rejected TOOL reversible: simple 4h

Within 7 days, surfaces which of the 4 baseline + 8 rotating pain-template corpora (S198) are dollar-efficient and which are subsidizing low-yield shapes. Decision input for next corpus reweighting cycle.

Add p3_evidence_marker_validator tool

19 Jun 2026proposed

filter rejected TOOL reversible: simple 3h

P3 outputs gain >=95% evidence-marker compliance within a week (currently anecdotal; no measurement). Telemetry surfaces which corpora/templates systematically produce evidence-less rubric outputs.

Add forward_clock_falsifier_required gate at graduation

19 Jun 2026proposed

council rejected GATE reversible: simple 5h

100% of newly-graduated proposals carry a parseable forward-clock falsifier. Backfill: existing graduated proposals are not re-evaluated. Within 30 days, the engine has a deterministic grading queue and can self-report hit-rate.

Add kill_shape_dedupe pre-filter gate against 30-day kill log

19 Jun 2026proposed

filter rejected GATE reversible: medium 6h

v2_backfill_orphan_S148 (7x in current trace) should drop to <=2 in the next 30-day window. Total proposals reaching meta_council_verdict should fall by 10-20%, reducing move-cost.

Add v2_a11 distribution_specificity axis to filter_score.js

19 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

Proposals currently scoring high on v2_a7 via vague 'reach via social' phrasing will drop 1-2 composite points; proposals with named channel + audience will be unchanged. Top-10 ranking should reshuffle by ~30%, with B_PILOT-shaped (UK recruitment, named channel) proposals climbi…

Genesis prompt session-tag bloat linter (TOOL, not edit)

18 Jun 2026proposed

council rejected TOOL reversible: simple 3h

Output CSV will name ≥3 session-tag blocks where `decage_status` flags 'decaged but still in prompt': specifically S121 evaluator-side cage (decaged S183), S151 archetype rotation (claimed temporary through 2026-06-14 — already expired by today 2026-06-18), and the S183 product-s…

Third-judge tiebreaker harness on high-IQR filter_score runs

18 Jun 2026proposed

council rejected HARNESS reversible: simple 4h

Currently ~10-20% of completed scoring sequences end with IQR ≥ 1.5 (visible in `filter_score_iqr` histogram). For these high-disagreement candidates the tiebroken_median will shift composite by ≥0.5 points in ≥50% of cases, changing the top-of-queue ordering. Promotion/kill deci…

Rotating-source corpus loader in hypothesis_engine genesis (mirror S198 fix)

18 Jun 2026proposed

filter rejected CORPUS reversible: simple 5h

On a 5-cycle replay (one per rotated corpus), the genesis `fresh_signal_source_id` field should report 5 distinct source-corpus origins; the prior 30 admitted hypotheses currently report ≤3 distinct origins. Archetype-family histogram across 30 admissions should hit ≥6 of F1-F10 …

Cross-cycle dedup gate before promotion from ranked → argument

18 Jun 2026proposed

council rejected GATE reversible: simple 6h

On the active pool, at least 1-3 of the next 20 ranked candidates will collide at cosine ≥ 0.82 with a sibling admitted within the prior 14 days. Saturated sub-cluster (S136 procurement/SOW family) should see the highest collision rate. Active-pool variety per the S151 archetype_…

Add argument-trace summarizer harness around judge call

17 Jun 2026proposed

filter rejected HARNESS reversible: simple 7h

On a 30-run A/B (15 with summarizer, 15 raw), judge verdicts under summarizer mode show lower variance (stdev of composite KEEP/KILL across two judge re-runs of same candidate drops by >20%) per the reasoning-trace fluency-trap finding. Token cost on judge call drops ~40%.

Promote shadow describability gate to v2_a12 axis

17 Jun 2026proposed

filter rejected AXIS reversible: simple 3h

847f7e-shape proposals (per Corpus D pattern reference) score 0-1 on v2_a12; ROBUST candidates score 2-3. Expected composite spread of 2-3 points between the two classes. Graduation rate on metaphor-heavy candidates drops by ~25%.

Add judge-proposer agreement-rate sentinel tool

17 Jun 2026proposed

filter rejected TOOL reversible: simple 5h

On current run history (recent 50 verdicts), sentinel will report a baseline agreement rate. If sycophancy is present (proposer = Sonnet 4.6, judge = gpt-5.5-codex), expect baseline <0.75; rate climbing toward 0.85 across future runs surfaces a calibration drift Commander would o…

Add pre-filter gate detecting v2_backfill_orphan_S148 condition

17 Jun 2026proposed

filter rejected GATE reversible: simple 6h

Of the 7 v2_backfill_orphan_S148 kills in Corpus E, all 7 would be caught at the pre-filter stage. Argument/judge spend on these candidates drops to $0. Total saved per 100-run batch: ~$0.40 (7 candidates * argument+judge cost).

Add v2_a11 audit_shape_penalty axis to filter_score

17 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

On the 4 Commander-overridden KILL cases (all audit-shaped per Corpus E override log), v2_a11 scores 0-1 producing composite drop of 2-3 points vs prior runs. At least 3 of the 4 fall below 17/33 composite and would have been auto-killed without override.

F11 off-thesis exemption budget cap with weekly telemetry

16 Jun 2026proposed

filter rejected PROMPT reversible: simple 2h

F11 exempt admits will be bounded at 5 per 14-day window. If the engine currently runs hot on F11 (which is the risk of an unbounded exemption), this caps it. Expected hit rate of the new cap: 1-3 reject events per month if F11 was previously absorbing the saturation pressure fro…

Median-zero-on-any-axis hard kill in filter_score

16 Jun 2026proposed

filter rejected GATE reversible: simple 2h

Hypotheses with a load-bearing weakness on ONE axis but high scores elsewhere will be killed regardless of composite. Estimate: 5-12% of currently-ranked hypotheses have a median-0 axis (visible by reparsing moves.scores_json). Those hypotheses currently graduate to 'scored'/'ran…

Council Round-3 citation source-diversity gate

16 Jun 2026proposed

filter rejected GATE reversible: simple 5h

Judges that anchor their entire R3 verdict on (e.g.) only 'risk' will be excluded. Predicted effect: ~10-20% of R3 verdicts that survive the citation-count check fail source-diversity. Net: STRONG_BUILD/KILL escalations from R3 will be drawn from a more grounded set of judges, re…

Deterministic closed-archetype keyword tripwire pre-filter

16 Jun 2026proposed

filter rejected TOOL reversible: simple 3h

Closed-archetype proposals that slip past the LLM's prompt-side filter (typically when wording is subtle, e.g. 'forecast accuracy benchmarking for capital allocators') will hit the tripwire ~95%+ of the time when any banned term is literally present. Will catch an estimated 5-15%…

Per-axis judge-spread veto gate in filter_score graduation

16 Jun 2026proposed

filter rejected GATE reversible: simple 4h

Hypotheses where HIGH and LOW judges disagree systematically on one filter (e.g. 'fast_feedback_loops' hi=3 lo=0, hi=3 lo=0, hi=2 lo=0 → mean_spread=2.67) will land in 'disputed' rather than 'ranked'. Back-of-envelope: ~15-25% of currently-ranked hypotheses have ≥1 axis with mean…

Filter-score calibration report tool

15 Jun 2026proposed

council rejected TOOL reversible: simple 3h

Surfaces which v2_a1..v2_a10 axes are pulling weight vs which are pure noise. Architect uses report to retire or reweight low-signal axes in a follow-up proposal.

Genesis prompt: require >=2 cited corpus anchors per hypothesis

15 Jun 2026proposed

council rejected PROMPT reversible: simple 1h

Average anchor count per proposal rises from current ~1.1 to >=2.0. Manually-judged 'hallucinated framing' rate (proposals not grounded in the supplied corpus) drops by half on a 50-hypothesis spot check.

Genesis retry-with-jitter harness on empty/invalid output

15 Jun 2026proposed

filter rejected HARNESS reversible: simple 2h

Empty-batch rate on genesis drops from the current ~8% (per S176 validation traces) to <2%, without inflating overall proposal count by more than 5%.

Pre-argument source_corpus diversity gate

15 Jun 2026proposed

council rejected GATE reversible: simple 3h

When Corpus T floods (as in S183-S186 forecaster arc), the gate fires on ~15-25% of batches and keeps Corpus E/D candidates from being starved out of argument move. Cross-corpus citation rate in graduated hypotheses rises measurably.

Add v2_a11 falsifier_concreteness axis to filter_score

15 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

Hypotheses written without a runnable falsifier (the 847f7e-shape pattern Commander has flagged) drop ~2 points in composite, pushing them below the kept-threshold ~60% of the time. ROBUST candidates retain composite within 0.5 of current.

Add falsifier_one_cycle_resolvable axis 0-3 to meta filter_score rubric

14 Jun 2026proposed

council rejected AXIS reversible: simple 3h

Re-score the last 19 historical filter_score moves with the new prompt: expect 3-6 to land at 0-1 (e.g. 2026-06-13's calibration proposal admits 'target ~3-4 cycles' — would score 1; 2026-06-12's S151 RECENT_ADMITS proposal needed '30 admissions ~3-6 weeks' — would score 0). Dist…

Add validateProposal check: each non-new target_file must exist on disk

14 Jun 2026proposed

council rejected GATE reversible: simple 3h

Replay against the last 30 meta proposals (~6 cycles): expect 0-2 to fail the gate (low rate, high signal — those are the ones that would not have implemented as written). Going forward, no council-advanced proposal can reach Architect with a typo'd or stale target path.

Add meta_engine/lib/replay_proposals.js dry-run tool for gate/axis proposals

14 Jun 2026proposed

council rejected TOOL reversible: simple 5h

Within 10 cycles post-build, ≥3 future proposals reference this tool path in their falsifier text instead of describing ad-hoc replay. Architect implementation work on any gate/axis proposal that already passes council saves ~30 min of duplicated loader code.

Rescope loadCorpusE from product lane to active lanes (meta + opportunity)

14 Jun 2026proposed

council rejected CORPUS reversible: simple 3h

corpus_E_items rises from 3 to ~5-6 with meta-lane-active content. Within 3 cycles post-change, ≥2 proposals will cite a meta-lane E source (e.g. 'kill_reason filter_rejected = N'). Proposals targeting hypothesis_engine paths drop further because Corpus E now visibly shows where …

Add forbidden-target-files clause for retired product lane to META_PROPOSER_SYSTEM

14 Jun 2026proposed

council rejected PROMPT reversible: simple 2h

On 2026-06-12 cycle 5/5 proposals targeted hypothesis_engine/moves/* paths (council rejected 19/19 — possibly because they target a frozen lane). On 2026-06-13 only 1/5 did. Re-running with the new prompt+gate over the next 3 cycles (~15 proposals), expect ≤2/15 hypothesis_engine…

Add meta_calibration.js: Spearman ρ between proposer self-score and council verdict_score

13 Jun 2026proposed

council rejected TOOL reversible: simple 6h

Builds the dataset needed to falsify the m_a_self_score field's value. With ≥20 verdicted rows accumulated, ρ is interpretable. If ρ > 0.6, self-score can be promoted into the filter_score axes (free pre-LLM signal). If |ρ| < 0.2 over 20 rows, the self-score is noise and the gene…

Add pre-filter shape gate: drop audit/validator proposals lacking engine-output falsifier

13 Jun 2026proposed

council rejected GATE reversible: simple 4h

3 of 4 historical Commander overrides were on audit-shaped meta proposals. The gate would have caught the same shape pre-LLM, saving ~$0.013 downstream per gated row + Commander review slot. Replay against last 19 proposals: 1-3 should match (the historical rate).

Swap council R1 Sonnet judge for gpt-5.5-codex (de-overlap with argument)

13 Jun 2026proposed

council rejected HARNESS reversible: simple 5h

Council R1 'consensus at round_1' rate (resolved_at='round_1') drops measurably on the same input set — current historical rate vs replay rate should differ by ≥10 percentage points. R2 escalation rate may rise but should not triple. Net: more proposals get to round 2 debate inst…

Add 6-axis 0-3 scoring to meta filter with min-axis floor of 1

13 Jun 2026proposed

council rejected AXIS reversible: simple 3h

Of 19 historical meta proposals at filter stage, at least 1 had an axis the proposer self-scored 0 (proposer self-score data in moves table for meta_genesis). The min-axis floor would route any LLM-judge-issued 0 on the same axis to DROP, producing a measurable filter_rejected de…

Add pre-genesis Jaccard dedup gate against last 30 meta hypotheses

13 Jun 2026proposed

council rejected GATE reversible: simple 4h

Of the last 19 meta runs, ≥2 candidates carry target_files identical to a prior council_rejected/commander_killed row (the 4 Commander-override pattern repeats audit/observability shapes across cycles). Re-running dry against historical proposals: ≥2 catches, ≤5 catches (otherwis…

Rotate judge model pair across the 3 filter_score runs

12 Jun 2026proposed

council rejected SKILL reversible: medium 4h

Cross-vendor doctrine S160 says generator+judge must be different families to decorrelate errors; filter_score already decorrelates within a run, but NOT across runs. Expect median IQR across the 5 axes to rise by 0.4-0.8 (currently the 3 runs are too consistent because they use …

Stop truncating hypothesis description to 700 chars in filter_score

12 Jun 2026proposed

council rejected PROMPT reversible: simple 1h

On 20 hypotheses with description > 700 chars, rerun filter_score with the 2500-char prompt and compare composite_rank_score against the historical 700-char score. Expect median absolute delta of 0.5-1.5 on composite; the solo_founder_feasible axis specifically should shift on hy…

Add closed_thesis_lexical_validator post-parse tool

12 Jun 2026proposed

council rejected TOOL reversible: simple 4h

On replay against the last 100 admits, expect 3-8 admits to be retroactively rejected as closed-thesis descendants the proposer self-classified as 'clearly_outside' incorrectly. Specifically: any admit whose audience includes 'allocator', 'family office', or 'research publisher' …

Add post-parse lexical gate on genesis manual_entry_path field

12 Jun 2026proposed

council rejected GATE reversible: simple 3h

On replay against the last 100 admits, expect 4-12 admits to be retroactively rejected because their manual_entry_path contains extraction tokens (e.g. 'we auto-extract the claim from the SOW PDF'). Going forward, ~5% of genesis runs that would have admitted will instead reject w…

Wrap council judges with per-judge consistency re-draw probe

12 Jun 2026proposed

council rejected HARNESS reversible: medium 6h

On 20 historical hypotheses re-run with the probe, 25-40% of Round-1 judges will produce a different verdict_action on the paraphrased second draw, identifying which 'unanimous 3-0 KILL' or 'unanimous STRONG_BUILD' verdicts are actually fragile. Probe cost: ~+33% of council_verdi…

Add buyer-language paraphrase skill to argument move

12 Jun 2026proposed

filter rejected SKILL reversible: simple 3h

Commander override rate (currently 4/N per E corpus) drops by ≥1 per 50 cycles because buyer-voice claim makes weak market-fit obvious before sign-off. Commander can spot 'no buyer would say this' faster than reading full argument.

Reweight synthetic.jsonl corpus contribution by graduation-yield

12 Jun 2026proposed

filter rejected CORPUS reversible: medium 5h

Graduation rate across next 50 cycles increases by ≥15% vs. baseline uniform sampling. Hot seeds (weight >0.5) dominate top-of-funnel without starving exploration (10% cold floor preserves coverage).

Add forecaster-grade pre-graduation gate at filter→argument boundary

12 Jun 2026proposed

filter rejected GATE reversible: simple 5h

Candidates that pass filter but die at argument due to thin evidence drop by ≥30% (estimated from last 14 days of engine.db). Argument move cost-per-graduated-candidate drops proportionally.

Add retry-with-vendor-fallback harness around filter_score judge call

12 Jun 2026proposed

filter rejected HARNESS reversible: simple 4h

Cycles aborted due to judge transient failure drop from current ~5-8% (per E corpus) to <1%. Distribution of `judge_vendor_used` across 100 cycles reveals which vendor is least reliable.

Add genesis-prompt jargon-leak validator tool

12 Jun 2026proposed

council rejected TOOL reversible: simple 3h

On next 30 genesis runs, ≥80% of outputs contain zero AE-internal terms post-rewrite vs. ~30% baseline (sampled from last 7 days of brain/proposals/digest-2026-05-*).

Add pre-genesis vendor-rotation gate to prevent generator-judge collision

12 Jun 2026proposed

filter rejected GATE reversible: simple 3h

Zero cycles in the next 50 runs where genesis and judge share a vendor family. Detects misconfiguration during model upgrades (e.g., swapping Sonnet→Opus on genesis without rotating judge).

Add v2_a11 describability axis to filter_score

12 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

On the 43 graduated candidates from S157 NBJ sweep, 847f7e-shape proposals (engine-mirror IP) score 0-1 on a11 while ROBUST candidates score 2-3, producing a 1.5-2.5 point composite spread that increases kill rate on jargon-laden proposals by ≥20%.

Add orphan-signature dedup tool to skip known-dead candidate shapes

12 Jun 2026proposed

filter rejected TOOL reversible: simple 5h

Reduces v2_backfill_orphan_S148 kill rate from 7/12 (58%) to ≤2/12 over the next 12 proposals. Saves ~30 min compute per orphan avoided.

Add named_buyer_present pre-filter gate (Manifesto F1)

12 Jun 2026proposed

council rejected GATE reversible: simple 3h

Across last 295 proposals, ≥60 retroactively gate-kill on this rule; of the 4 Commander overrides, ≥3 fail this gate. Cuts filter_score compute by ~20% on future runs.

Add rubric-stability harness wrapping filter_score with dual phrasing

12 Jun 2026proposed

filter rejected HARNESS reversible: medium 6h

On 40-item meta hypothesis sample, 4-8 candidates flagged rubric_unstable; flagged set overlaps with ≥2 of the 4 Commander overrides, demonstrating rubric-gaming signal predicts override risk.

Add v2_a11 describability axis using NBJ 5-Q test

12 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

On rerun of 43 graduated candidates, the 4 Commander-overridden candidates (3 KILL + 1 DEFER) score ≤1 on a11 while approved candidates score ≥2, producing 2-3 point composite spread that correlates with override decisions.

Add v1_a6 acute_pain_not_candy axis to filter_score.js FILTERS array (shadow)

12 Jun 2026proposed

council rejected AXIS reversible: simple 4h

Across the historical 40 meta_genesis admissions referenced in the corpus, A6 shadow scores will cluster bimodally (acute vs. vitamin) and should correlate r > 0.4 with council STRONG_BUILD/WEAK_BUILD vs KILL/DEFER actions. If correlation is weak, the rubric is mispurposed or the…

Negative-control basket overlap gate in forecaster precommit --lock

12 Jun 2026proposed

filter rejected GATE reversible: simple 3h

On future S185 batches with the typical 12 theses sampled across [agent…]/[model…] anchors, the gate will reject batches where the broad-anchor theses (5 of 12 in fwd_2026-06-08_h45) collapse onto the same negative-control basket. Expected: 10-30% of attempted locks initially rej…

Source-citation existence verifier for S185 generator theses

12 Jun 2026proposed

council rejected SKILL reversible: simple 6h

Some fraction of S185 theses currently 'cite' sources by URL/handle without the source meaningfully supporting the claim (LLM citation hallucination is a known failure mode even on tuned-up generators). Expected: 5-25% of theses rejected on first run; after generator prompt tight…

Council coin-flip baseline tracker for verdict-vs-chance signal audit

12 Jun 2026proposed

filter rejected TOOL reversible: simple 5h

After 50+ council runs, if council-vs-baseline agreement is >70% on KILL actions, the council is mostly recapitulating priors and is not adding much signal; <40% suggests genuine discrimination. Either result is informative; current state is unknowable.

Doctrine TTL watchdog: warn/halt on expired session-tagged rules

12 Jun 2026proposed

filter rejected GATE reversible: simple 3h

On 2026-06-15 the scheduler will warn 'S151 archetype rotation rule expired 2026-06-14'. Currently nothing surfaces this; the doctrine stays in the prompt forever as cargo. After implementation, expired doctrines either get re-ratified by Commander or removed.

Add cross-vendor judge-disagreement logger

12 Jun 2026proposed

filter rejected TOOL reversible: simple 3h

After 50 council_verdict runs, the disagreement-type distribution is non-uniform (≥40% in one type), making it usable as a meta signal. The sonnet_confident_codex_rejected bucket — if nonzero — names the candidates most worth post-mortem review.

Genesis prompt: require explicit buyer-pain-alternative triple in proposal body

12 Jun 2026proposed

council rejected PROMPT reversible: simple 3h

Mean v2_a6 (acute_pain_not_candy) score on freshly-generated candidates rises by ≥0.5 over the next 100 candidates; candy_not_pain kill-reason class (after taxonomy tool above lands) drops below 8% of KILLs.

Add retry-with-concrete-frame harness around argument.js attack generation

12 Jun 2026proposed

council rejected HARNESS reversible: simple 4h

Retry rate stabilizes between 10-25% (proves trigger is firing). Among retried candidates, council_verdict accept rate increases by ≥5 absolute points vs. the no-retry baseline measured on the prior 30 days.

Add kill-reason taxonomy classifier tool

12 Jun 2026proposed

filter rejected TOOL reversible: simple 6h

Weekly kill-reason distribution chart becomes computable without manual aggregation; the 'other' bucket stays under 20% of classifications after 14 days, indicating vocab covers observed reality.

Add v2_a11 reachability-evidence axis to filter_score

12 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

On a backtest over the last 43 graduated candidates, composite spread between top-quartile and bottom-quartile widens by ≥1.5 points; candidates that previously graduated but Commander DEFERed for 'unreachable buyer' score 0-1 on a11.

Promote closed-thesis prune from ranking preference to pre-genesis GATE

12 Jun 2026proposed

filter rejected GATE reversible: simple 5h

meta_genesis token cost in the move_cost_rollup drops 15-25% over the next 7d window; KILL-with-reason-off-thesis rate on graduated candidates drops below 10% (currently dominant in Corpus E kill-reason distribution).

Add closed-thesis explicit veto section to genesis prompt

12 Jun 2026proposed

council rejected PROMPT reversible: simple 2h

Closed-thesis kills (currently caught only by decage post-filter) drop because most are suppressed at genesis. Genesis volume drops by ≤15% (only the closed-thesis fraction); decage closed_thesis_prune fires <30% of its current rate within 14 days.

Build kill-reason replay bundle exporter for offline post-mortem

12 Jun 2026proposed

council rejected TOOL reversible: simple 5h

Time-to-diagnose any kill drops from 'reconstruct context manually from engine.db' (~20 min) to 'open one JSON file' (<2 min). The 7× v2_backfill_orphan_S148 cluster becomes diagnosable in one session.

Add cross-vendor judge disagreement harness around council_verdict

12 Jun 2026proposed

filter rejected HARNESS reversible: simple 4h

Within 50 disagreement rows, a calibration pattern is visible: e.g. 'Codex scores a3_reachability systematically 1 point lower than Sonnet on B2B-shape candidates'. This pattern enables targeted judge-prompt tuning.

Add v2_a11 evidence-density axis to filter_score

12 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

Hypotheses authored from a single internal hunch with no external grounding lose ~2 points of composite (drop one axis from 3 to 0). MANIFESTO-v4-aligned evidence-grounded proposals widen their composite lead over speculative ones by 1.5–2.5 points.

Add pre-council self-attack skill to prune weak hypotheses before council budget spend

12 Jun 2026proposed

filter rejected SKILL reversible: simple 6h

Of the 2× fatal_objection_both_confirm kills/week, ~60% are caught at self_attack instead, freeing council budget. Council budget per surviving hypothesis rises measurably (>15%). Self_attack defensibility_score correlates positively (r > 0.4) with council survival.

Add orphan-detection pre-filter gate to short-circuit v2_backfill_orphan kills

12 Jun 2026proposed

filter rejected GATE reversible: simple 3h

The 7× v2_backfill_orphan_S148 kills currently consuming filter_score budget shift to a zero-cost pregate KILL; filter_score spend on orphan candidates drops to ~0; downstream kill_reason distribution loses the v2_backfill_orphan_S148 bucket within filter_score post-mortems.

Add reversibility-cost validator tool for proposed_diff

12 Jun 2026proposed

filter rejected TOOL reversible: simple 4h

Over 7 cycles, ~10-20% of proposals get reversibility downgraded by the validator (LLM self-reports SIMPLE more often than warranted). Graduated COMPLEX-reversibility proposal share drops from current visible rate to <=5%.

Add evidence-citation requirement skill to genesis prompt

12 Jun 2026proposed

filter rejected PROMPT reversible: simple 3h

Evidence-field population rises from current ~30% (estimated from sample) to >=95% over next 3 cycles. Downstream composite_v2 scores correlate more tightly with evidence-row count (Pearson >=0.3) because grounded proposals describe more concrete falsifiers.

Add solo_time_estimate sanity gate (reject >16h, flag 9-16h)

12 Jun 2026proposed

filter rejected GATE reversible: simple 3h

Over 7 cycles, ~5-15% of proposals get hard-rejected as oversized (currently they reach council and waste argument tokens). Graduated proposals' median solo_time_estimate drops from current ~6h to ~4h.

Add commander_override_reason to evidence loop in genesis

12 Jun 2026proposed

filter rejected CORPUS reversible: simple 5h

Over the next 7 cycles, proposals citing source_corpus='commander_override' rows in their evidence field appear in >=20% of generated candidates, AND commander_override rate on graduated proposals drops from current 4/21 (~19%) to <=10% — i.e. the engine learns from its disagreem…

Add cross-vendor judge-disagreement retry harness

12 Jun 2026proposed

filter rejected HARNESS reversible: simple 6h

Over a 7-cycle window, ~15-25% of proposals trigger retry; of retried proposals, >=40% have changed verdict (judge confirms or reverses with cited reason). Net effect: graduated-proposal APPROVE precision (post-deploy non-revert rate) rises by >=10 points vs pre-harness baseline.

Add pre-genesis duplicate-title gate against proposals/ history

12 Jun 2026proposed

council rejected GATE reversible: simple 5h

On the next 7 cycles, duplicate_with_prior kills should account for 10-30% of pre-filter rejects (currently invisible because dupes pass filter_score and burn argument tokens). Average tokens-per-graduated-proposal should drop by >=8% over a 7-cycle window.

Add v2_a11 describability axis to filter_score

12 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

On the 21 currently-graduated meta proposals, the 847f7e-shape (vague-clarity) proposals score 0-1 while concrete-diff proposals (e.g. the S179 handoff-contract style) score 2-3, producing a 2-3 point spread in composite_v2 that correlates with downstream council_verdict APPROVE …

Vocabulary-bypass detector for closed-thesis archetype at council_verdict

12 Jun 2026proposed

filter rejected GATE reversible: simple 6h

Of the next 50 genesis proposals, the gate fires on 2-5 (4-10%) that previously would have slipped past the vocabulary trigger. Manually inspecting the matched archived hypothesis for each fired proposal confirms a true archetype match in ≥70% of fires (precision check on a 5-fir…

Require `buyer_check_named` output field in genesis PROPOSER_SYSTEM

12 Jun 2026proposed

council rejected PROMPT reversible: simple 3h

Of the next 30 genesis proposals, ≥40% will be rejected at parse-time for missing/insufficient buyer_check. Surviving proposals will have a structurally testable validation step the Commander can run within 14 days. Re-running council_verdict on 10 previously-graduated candidates…

Spread-triggered third-reviewer tie-break in filter_score composite

12 Jun 2026proposed

filter rejected HARNESS reversible: simple 4h

On the next 20 hypotheses to run filter_score, the run_total standard deviation across the 3 runs of each hypothesis decreases by ≥30% vs. the previous 20-hypothesis baseline measured from `engine.db` move history. Composite IQR penalty (-0.5×IQR) is materially smaller for hypoth…

Add 6th filter `named_post_mortem_data_source` to filter_score.js FILTERS

12 Jun 2026proposed

filter rejected AXIS reversible: simple 5h

Among graduated candidates whose data_moat scored 2-3, those whose moat language is hand-waved (no named trajectory source) score 0-1 on the new axis, producing a 2-3 point composite spread that re-ranks vague-moat hypotheses below specifically-sourced ones. Forecaster-v1-class h…

Add temporal-specificity sanity gate to T-corpus ingest

12 Jun 2026proposed

filter rejected GATE reversible: simple 4h

Re-running ingest over the last 30 days of digest items will reject digest-2026-06-04-007 (Air Canada Jan 2026 1,247 passengers — flagged by Gemini as hallucinated) and at least 1-2 other items where Gemini's review caught fabricated specifics. T-corpus items reaching genesis wit…

Add resolver clause requirement to genesis prompt

12 Jun 2026proposed

filter rejected PROMPT reversible: simple 1h

Resolver-clause presence rate in raw genesis output rises from current baseline (~30-50% per S184 record) to ≥85% within first 50 post-change generations. Pairs with the gate proposal — prompt change reduces gate rejection rate.

Add commander-override capture harness

12 Jun 2026proposed

filter rejected HARNESS reversible: simple 2h

After 14 days, the log enables a one-liner query that would have surfaced the observed 'audit-shaped' cluster (4 overrides clustered on audit-shaped products per S183 traces) in real time. Enables downstream axis-weight tuning informed by override patterns.

Add kill-reason concentration detector tool

12 Jun 2026proposed

council rejected TOOL reversible: simple 3h

Run against current engine.db (which contains the v2_backfill_orphan_S148 cluster of 7) produces exactly one warning for that reason. Future kill-clusters surface within 24h of crossing threshold rather than at retro-review.

Add v2_a11 resolver_specificity axis to filter_score

12 Jun 2026proposed

filter rejected AXIS reversible: simple 3h

On the S183 graduated-candidate set (n=43), pre-pivot speculation-shaped hypotheses score 0-1 (mean ≈ 0.7); post-pivot forecaster-shaped hypotheses score 2-3 (mean ≈ 2.4). Composite-score spread of ≥1.5 points between cohorts.

Add resolver-clause pre-filter gate to genesis output

12 Jun 2026proposed

filter rejected GATE reversible: simple 4h

Per S184 forecaster v1 calibration record, ~40-55% of candidate hypotheses graded 'unscoreable - no resolver' by the forecaster grader. Gating at genesis should drop unscoreable rate at grader from baseline to ≤10% while adding ≤5% latency.

Add commander-override echo to meta_engine verdict log

12 Jun 2026proposed

filter rejected SKILL reversible: simple 7h

Commander decision latency on pending_commander queue drops (faster context recall). Commander-override consistency increases: shapes previously rejected get rejected again at higher rate (>70% match on similar-shape pairs).

Build product_shape classifier tool over fixed taxonomy

12 Jun 2026proposed

council rejected TOOL reversible: simple 6h

Every kept candidate has a product_shape label with confidence ≥0.5. Missing-shape rate <5% over a 100-candidate sample. Downstream filter/ranking moves can read product_shape rather than re-deriving from text.

Require resolution_event field in genesis prompt schema

12 Jun 2026proposed

filter rejected PROMPT reversible: simple 2h

Post-filter resolution_gate rejection rate drops from current baseline (the resolution_gate currently kills candidates whose hypothesis is unresolvable). Council calls on resolvability-deficient candidates drop in lockstep.

Wrap argument move with timeout + single-retry harness

12 Jun 2026proposed

filter rejected HARNESS reversible: simple 4h

Argument-move stuck-pending count drops to ~0. p95 wall-clock for argument completion bounded by 180s (90s × 2). No silent hangs requiring manual SQL intervention.

Add closed-thesis pre-filter gate with reason logging

12 Jun 2026proposed

filter rejected GATE reversible: simple 5h

Candidates retreading closed theses are rejected ~30s earlier in the pipeline, avoiding ~3 LLM calls each (filter_score + argument + verdict). Estimated 8-15% of current filter_rejected rows will be reclassified as closed_thesis_gate rejections. Daily LLM spend on rejected candid…

Add v2_a11 evidence-emergence axis to filter_score

12 Jun 2026proposed

council rejected AXIS reversible: simple 4h

Forecaster-shaped candidates (per S183 pivot) score 2-3 on A11; legacy product-shape candidates (e.g. SENTINEL-retread, 847f7e-shape) score 0-1. Composite-score spread between shapes widens by 2-3 points, making the forecaster lane separable at the filter stage rather than at cou…

Genesis Q5-NBJ pre-mortem requirement injected into prompt

12 Jun 2026proposed

council rejected PROMPT reversible: simple 2h

Genesis output volume drops 15-25% (self-rejection in the chain-of-thought, candidate not emitted). Among remaining candidates, the fraction tagged NO_PRE_MORTEM_FAILURE is <10%. Filter pass-through rate on candidates with substantive pre-mortems rises, because the lowest-quality…

Filter timeout fallback harness with truncated-context retry

12 Jun 2026proposed

filter rejected HARNESS reversible: simple 3h

Current silent-drop rate (visible as 'filter never ran' candidates in engine.db) drops to 0. The 'unscored_timeout' bucket surfaces a new Commander review queue. If pre-harness silent-drop rate is X%, post-harness retry-success rate captures roughly X * (1-truncation-loss) of tho…

Commander override classifier tool with fixed taxonomy

12 Jun 2026proposed

filter rejected TOOL reversible: simple 5h

After classifying the last 200 overrides, shape-rejection + fluency-trap combined account for >40% of overrides. This validates whether the fluency-trap gate proposal is targeting the dominant failure mode, or whether 'missing-context' / 'wrong-vertical' would be a higher-yield i…

Add v2_a11 cross-vendor disagreement axis

12 Jun 2026proposed

filter rejected AXIS reversible: simple 4h

Over the next 100 filter-scored candidates, ~15-25% score 2-3 on v2_a11. Candidates scoring 3 on v2_a11 enter argument round at >2x baseline rate. Commander-approval rate among high-disagreement (a11=3) candidates differs from baseline by at least 1.5x in either direction.

Add fluency-trap pre-graduation gate

12 Jun 2026proposed

filter rejected GATE reversible: simple 6h

On retro pass against the last 30 graduated candidates, ~20-30% trip the gate (matching the manifesto-v4 deflation rate). On forward pass, graduation volume drops 20-30% with the dropped candidates concentrated in shape-rejection / fluency-trap kill_reason buckets.

cheapest_instant_kill_test extraction harness from council_verdict

1 Jun 2026proposed

filter rejected HARNESS reversible: simple 4h

Every active non-killed hypothesis after first council pass will have 1-3 kill tests surfaced in the brief Commander reads. Over a 14-day window, the brief becomes more actionable: Commander can scan kill tests and run the cheapest one rather than re-reading 9 deep reports.

signal_recency_decay term in composite_rank_score

1 Jun 2026proposed

council rejected HARNESS reversible: simple 4h

Hypotheses scored fresh (signals < 30d) see composite identical to today. Hypotheses that have been languishing in GATHER_MORE_SIGNAL for >90 days see their composite drop by 0.3-0.9 points and may fall out of the top-N graduation bucket. Stalled hypotheses naturally demote thems…

Signal A primary-source domain validator tool

1 Jun 2026proposed

filter rejected TOOL reversible: simple 5h

Recorded signal_a_url fields across the existing hypothesis corpus will fail the validator at a non-zero rate (>5%). New runs will see fewer Signal A claims and a proportional increase in Signal D — graduation will become slightly harder for hypotheses that previously relied on n…

Derive consensus_quality axis from existing per-run spreads, fold into composite

1 Jun 2026proposed

council rejected AXIS reversible: simple 4h

Hypotheses where Opus(high) and Gemini(low) disagree by >=2 per axis on average will see composite drop by 1.5-3.0 points. Hypotheses where the two judges align tightly gain up to 1.5. Expected rank inversions on ~20-30% of currently-ranked hypotheses.

closed_thesis_distance machine-readable rejection gate in genesis

1 Jun 2026proposed

council rejected GATE reversible: simple 3h

Re-running the proposer with the S128 trigger lexicon present in 50 candidates should produce ~10-20 admissions blocked at the new gate (today they slip through to dedup/council). Per-genesis-cycle cost drops by the dedup+council spend on blocked candidates (~$0.02-0.05 per save)…

Add cross-vendor judge enforcement gate at council_verdict

31 May 2026proposed

filter rejected GATE reversible: simple 3h

Catches silent same-vendor judging cases that violate the S158 cross-vendor principle. Over 30 cycles, expect 0-3 enforcement-triggered rejudges (low because doctrine is mostly followed); rate >10% means routing config has drifted and is the real fix.

Reweight synthetic.jsonl corpus by per-item graduation lift

31 May 2026proposed

filter rejected CORPUS reversible: simple 7h

Filter-kept rate of genesis output rises 10-25% within 30 cycles, because dead synthetic items stop being retrieved at uniform rate. Top-quartile synthetic items get retrieved 2-3x more often.

Add empty-noun-phrase detector tool gating pre-filter

31 May 2026proposed

council rejected TOOL reversible: simple 6h

Cuts filter-kept rate of 847f7e-shape proposals by ≥50% relative to current baseline, without adding LLM cost. Over 30 cycles, ≥80% of detector-flagged items are also human-rated as empty.

Add genesis backend-fallback harness for context overflow

31 May 2026proposed

filter rejected HARNESS reversible: simple 5h

Eliminates the silent 5-cycle genesis outage class observed in S180. Over 30 cycles, ≥98% of genesis attempts produce output. Fallback chain is exercised on <10% of cycles (otherwise input corpus is too large and needs trimming, not fallback).

Add falsifier-coherence gate between argument and council_verdict

31 May 2026proposed

filter rejected GATE reversible: simple 6h

Catches drift cases where argument constructs a sharp falsifier and council silently substitutes a softer one. Over 30 cycles, expect 5-15% of currently-approved proposals to be rejected for falsifier_drift, surfacing a class of error that pending_commander review currently absor…

Add v2_a11 describability axis scored via S157 NBJ 5-question rubric

31 May 2026proposed

filter rejected AXIS reversible: simple 4h

On the 43 historically-graduated candidates, 847f7e-shape proposals (empty noun phrases) score 0-1 on a11 while ROBUST candidates score 2-3, producing a 2-3 point spread on the composite. Filter-kept rate of 847f7e-shape drops ≥30%.

Add Corpus T signal-age decay axis v2_a12 for stale search items

30 May 2026proposed

filter rejected AXIS reversible: simple 4h

Proposals grounded only in stale Corpus T (e.g. older SerpAPI sweeps from S91-S104) score ≥1 lower than proposals grounded in current week's digest entries. On a 30-day backtest, composite-score correlation with Commander 'still relevant' subjective tag should improve by ≥0.15 Pe…

Add genesis JSON output validator harness with single retry

30 May 2026proposed

council rejected HARNESS reversible: simple 6h

Genesis-induced cycle failures (silent outage class — see fix in commit 0f2d20d for Bedrock Opus 4.6) drop to zero over 60 days. Downstream filter_score.js never receives a malformed proposal in production. Retry rate sits between 5-15% (signal that the validator is firing) but f…

Add cross-judge fatal-objection consensus short-circuit gate

30 May 2026proposed

filter rejected GATE reversible: simple 4h

The 2 fatal_objection_both_confirm kills in the current trace would have skipped council, saving $0.20 across those cycles. Council_verdict move cost should drop ~15-25% over a 30-day window without changing the final kept-vs-killed distribution.

Add v2_a11 commander-kill-likelihood axis from override history

30 May 2026proposed

filter rejected AXIS reversible: simple 8h

Across the next 20 cycles, candidates that the engine would have produced and council-passed but Commander would have killed should score ≤1 on v2_a11 at least 60% of the time. Composite score spread between Commander-KILL-shape candidates and Commander-pass-shape candidates ≥ 1.…

Add structural-duplicate gate before meta_argument move

30 May 2026proposed

filter rejected GATE reversible: simple 5h

On the 5-day Phase 1 trace (7× v2_backfill_orphan + 1× structural_duplicate_15ed71), at least 2-3 of those kills migrate from post-argument to pre-argument, saving ~$0.20-0.33 in argument cost per Phase 1 week. Argument move cost drops measurably in the next cycle telemetry.

Create hypothesis_engine/tools/orphan_scanner.js diagnostic script

23 May 2026proposed

council rejected TOOL reversible: simple 2h

Report reveals exact count of live orphaned rows (expected >0 given 7 kills in 7-day window with kill_reason='v2_backfill_orphan_S148'). Provides move_waste count per orphan to quantify GATE proposal value before implementation. If orphan_count=0, the GATE proposal (P1) is moot —…

Add composability axis to meta_engine filter_score judge prompt

23 May 2026proposed

filter rejected PROMPT reversible: simple 3h

When two proposals both target the same file and modify overlapping function blocks, the judge flags composability=major on at least one. Across 30 meta_filter_score runs (current observed volume), this surfaces implementation ordering conflicts before they reach Commander approv…

Add GW01 veto block to genesis.js for audit/verification cluster

23 May 2026proposed

filter rejected PROMPT reversible: simple 2h

Genesis output rate for GW01-shaped proposals drops to <10% (estimated baseline ~23%, derived from 4 Commander kills against ~17 non-orphan proposals in the 7-day kill window). This is the earliest intervention point — preventing generation rather than catching proposals post-hoc…

Add v2_a11 product_shape_gravity_resistance axis to filter_score.js

23 May 2026proposed

filter rejected AXIS reversible: simple 4h

The 4 Commander-killed hypotheses (a38d31 = AI Control Failure Forecast Audit, e9cb5c = Reality-Graded Upgrade Gate, c89a71 = ClaimGate for B2B SaaS, 6bf9c5 = AI Tool Claim Verification) all score 0 on v2_a11. Graduated candidates score 2-3. The axis adds a 2-3 point composite sp…

Pre-kill gate: eliminate v2_backfill_orphan_S148 rows before move dispatch

23 May 2026proposed

rejected GATE reversible: simple 4h

The 7 v2_backfill_orphan_S148 kills (58% of all kills in 7-day window) are pre-empted before any moves are dispatched. Each orphaned row currently burns evidence_search, red_team_kill, and steelman moves before the kill fires — at the observed move:kill ratio (~360 moves / 12 kil…

Auto-expire S112 PRE-SEND-GATE doctrine block in genesis.js by run-count check

22 May 2026proposed

filter rejected PROMPT reversible: simple 3h

With genesisRunCount ≈33 (33 days × ~1 run/day since 2026-04-19), S112 block is absent from PROPOSER_SYSTEM on next run. PROPOSER_SYSTEM character count drops by approximately the S112 block length (~900 chars, lines 88-106). Over the next 10 genesis runs without S112: if post-ho…

Add Commander override JSONL extractor as structured Corpus E feed for meta_engine genesis

22 May 2026proposed

filter rejected CORPUS reversible: simple 4h

The next meta genesis run after deploy will include the 3 recent Commander KILLs (ClaimGate, AI Tool Claim Verification, AI Control Failure) and 1 DEFER (Upgrade Gate) as explicit Corpus E items. At least 1 proposal in that run should cite 'commander_override' in its evidence fie…

Replace filter_score high-scorer callOpus47 → callCodexGpt55 for cross-vendor independence

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

Genesis (Anthropic/Sonnet 4.6) and the optimist reviewer (now OpenAI/Codex) are different vendors. On re-scoring 10 recent hypotheses with both versions, mean high-low spread per axis should widen by ≥0.15 across at least 3 of 5 axes, indicating the reviewers are less correlated.…

Create hypothesis_engine/scripts/backtest_filter_axes.js for axis retrospective validation

21 May 2026proposed

filter rejected TOOL reversible: simple 5h

Running node hypothesis_engine/scripts/backtest_filter_axes.js --axis=v2_a11 --labels=s157_labels.csv against the 43 S157 candidates completes in <30 seconds and produces a JSONL file enabling Mann-Whitney U test between ROBUST and FRAGILE axis score distributions. This replaces …

Add exception-class prose instruction to genesis.js PROPOSER_SYSTEM (S157-EC rule block)

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

Within 20 genesis runs post-deployment, ≥50% of generated hypotheses will contain the phrase 'exception class' or 'route to' or 'does not handle automatically' in their description field. Baseline rate is 2.3% (1/43 in S157 NBJ sweep). A grep over genesis output logs is sufficien…

Add commodity-wedge shadow check at argument.js entry point (Survivor E)

21 May 2026proposed

filter rejected GATE reversible: simple 6h

Flag rate in first 30 days should fall between 5-15% of argument runs. Flagged hypotheses should show lower graduation rates than unflagged over a 60-day observation window. If the commander-killed AI control / ClaimGate / AI Tool Verification for Agencies hypotheses are run thro…

Add non-convergence telemetry sink to council_verdict.js ESCALATED path (D.v2)

21 May 2026proposed

filter rejected GATE reversible: simple 3h

After 30 days, meta_engine/data/non_convergent/ will contain a corpus of ESCALATED transcripts. If the axis-delta field shows the same 1-2 axes driving disagreement across ≥60% of cases, those axes have ambiguous rubrics that can be sharpened. If zero files appear despite ESCALAT…

Add v2_a11 status_quo_displacement_ease axis to filter_score.js v2 rubric

21 May 2026proposed

filter rejected AXIS reversible: simple 4h

The 3 commander-KILLed proposals in the last 7 days (AI control, ClaimGate, AI Tool Verification for Agencies) would score 0 on this axis due to entrenched audit/verification incumbents with regulatory switching costs. Retrospective on 43 S157 NBJ candidates should show statistic…

Add commodity-wedge shadow gate logging in runFatalObjection

20 May 2026proposed

filter rejected GATE reversible: simple 6h

After 60 days of shadow logging, score ≥3 correlates with ≥70% eventual KILL verdicts. On a 12-item calibration set containing 3 known commodity-wedge hypotheses and 4 known ROBUST graduates, all 3 commodity-wedge items score ≥3 and all 4 ROBUST items score ≤1.

Add decisive_questions field to council_verdict Round 3 output schema

23 May 2026proposed

council rejected PROMPT reversible: simple 4h

Re-running escalated hypotheses 7199a9 and 2ca131 through the updated prompt each produces ≥2 distinct, non-overlapping testable questions naming specific observables (e.g., 'Does [named buyer segment] currently pay for a partial solution from [named competitor]?' rather than 'Is…

Wire v2_a7 distribution_reachability into filter_score.js

20 May 2026proposed

filter rejected AXIS reversible: simple 5h

d3786b-shaped hypotheses (cold outbound dependency, no owned channel) score v2_a7=0. ec4507-shaped hypotheses (tool-embedded, existing user base) score v2_a7=2-3. A 4-candidate test set spanning the distribution reachability spectrum produces a score range of ≥3 points on this ax…

Wire v2_a6 acute_pain_not_candy into filter_score.js

20 May 2026proposed

filter rejected AXIS reversible: simple 5h

Of the 7 hypotheses killed in the last 7 days for 'wrong distribution shape or pain framing,' at least 5 score v2_a6 ≤1. ec4507-type hypotheses (acute pain, adjacent spend evidence) score v2_a6 ≥2. The pre-council kill rate for candy-shaped hypotheses increases measurably, reduci…

Append full escalation transcript to non-convergent telemetry sink

20 May 2026proposed

filter rejected TOOL reversible: simple 3h

Every future council escalation produces a persisted JSONL record. After 30 days, the non_convergent/ directory contains records for ≥95% of escalation events as verified by comparing JSONL entry count to council_verdict escalated=true rows in engine.db.

Add reeval_in_days to GATHER_MORE_SIGNAL verdicts and deferred-reeval sweep in scheduler.js

19 May 2026proposed

filter rejected HARNESS reversible: medium 8h

5 of 9 recent council verdicts (55%) contain explicit temporal deferral conditions ('run Week 1 outbound before building', '7-day artifact-upload test', '7-day signal check'). These hypotheses currently sit in GATHER_MORE_SIGNAL indefinitely with no automatic re-evaluation. After…

Implement E: commodity-wedge shadow check before first argument move (five binary axes)

19 May 2026proposed

filter rejected GATE reversible: simple 6h

The 'RevOps Objection Taxonomy Normalizer' shape (GPT-5.5-Pro Round 1: passes describability, observed-buyer, solo-inbound, yet still structurally weak on urgency and data advantage) flags commodity_wedge=true on axes 3+4+5. After 4 weeks of shadow: hypotheses where commodity_wed…

Implement B.v2: NBJ 5-Q describability shadow gate before first argument move

23 May 2026proposed

council rejected GATE reversible: simple 8h

Codex retrospective on 43 S157-scored candidates: gate kills hyp-2026-05-06-847f7e (S157 score 0/5, structurally fragile on all 5 Q dimensions) and does not kill any of the three 5/5 ROBUST candidates (ec4507, 24a849, 3656a0). Spearman rank correlation between gate composite_scor…

Implement D.v2: non-convergence transcript sink in council_verdict.js

23 May 2026proposed

council rejected TOOL reversible: simple 4h

Current non-convergence rate: 22% (2 of 9 recent verdicts = 'council could not converge after 3 rounds'). At 9 verdicts/week, 30 days produces ~8–10 transcripts. The escalationReason field already distinguishes FACTUAL vs WEIGHTING vs FRAMING disagreements. The corpus enables the…

Wire v2 axes A1–A10 into hypothesis_engine/moves/filter_score.js (shadow-first)

23 May 2026proposed

council rejected ARCHITECTURE reversible: medium 12h

The 4 Commander-killed hypotheses (a38d31, c89a71, 6bf9c5, e9cb5c) should score ≤18/30 on v2 composite due to failing scalable_revenue (audit-shaped), distribution_reachability (no warm-contact base noted in override reason), and commander_non_engine_work_fit. The v1 composite sc…

Restructure council_verdict.js cheapest_instant_kill_test from string to machine-readable object

19 May 2026proposed

filter rejected PROMPT reversible: simple 4h

Every GATHER_MORE_SIGNAL verdict (currently ~55% of verdicts per S157 distribution: 25 ROBUST + 13 MIXED + 4 FRAGILE + 1 STRUCTURALLY FRAGILE out of 43) produces a machine-readable gate condition that Commander can execute without re-reading the full reasoning. The 2 non-converge…

Add timing_window structured field to genesis.js PROPOSER_SYSTEM output schema

23 May 2026proposed

council rejected PROMPT reversible: simple 3h

Genesis outputs for commodity or evergreen problems (e.g. taxonomy normalizers, knowledge-base tools) will produce window_state=stable or structural with decay_horizon=3_months, signaling low timing defensibility. Proposals tied to genuine substrate shifts (agent-era trust gaps, …

Require empirical failure analog in argument.js red_team_kill attacker output

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

Argument transcripts will contain named companies in attacker rounds, enabling council to distinguish theoretical objections from documented failure patterns. The RevOps taxonomy shape (S158 Round 1 survivor shape that passes all describability/reachability checks) should generat…

Add validation_week_test structured field to genesis.js PROPOSER_SYSTEM output schema

23 May 2026proposed

council rejected PROMPT reversible: simple 3h

The 5 recent council verdicts (5d7cca, 26fc18, cc72cd, 90778c, c27754) each independently invented ad-hoc week-1 tests. Post-change, genesis outputs carry those tests, so council can evaluate their credibility rather than invent them. 10 consecutive genesis outputs should contain…

Replace v1 five-axis scoring in filter_score.js with ratified v2.3 ten-axis rubrics

19 May 2026proposed

filter rejected PROMPT reversible: medium 8h

Hypotheses like Commander-KILLED a38d31 (audit product, no warm-contact base) and c89a71 (ClaimGate, relational sales) should score A5=0 (scalable_revenue: pure audit service) and A7=0–1 (distribution_reachability: warm intros required), producing v2 composites below 40% even if …

Add shadow commodity-wedge gate to argument.js pre-debate (no hard kill)

23 May 2026proposed

council rejected GATE reversible: simple 6h

RevOps Objection Taxonomy Normalizer shape (CRM-integrated taxonomy, no urgency event, dashboard deliverable) flags commodity_wedge_recommendation=true. hyp-2026-05-06-ec4507 (Support Escalation: SLA deadline forcing function, Zendesk timestamp as external ground truth, not CRM-d…

Tighten fast_feedback_loops rubric to penalize multi-causal outcome attribution

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

hyp-2026-05-06-847f7e (Support Promise Calibration Console — killed because 'CSAT/SLA outcomes are multi-causal') scores 0-1 on fast_feedback_loops under the revised rubric. hyp-2026-05-13-47730e (AI Portfolio Claim Auditor — killed because 'board verdicts multi-causal') scores 0…

Prompt genesis proposer to reason about exception classes inline

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

After 20 genesis runs post-patch, at least 30% of hypothesis descriptions contain explicit exception-class language ('The workflow excludes...', 'Human review is required when...', or equivalent). Current S157 baseline: 1/43 graduated candidates (2.3%) explicitly named exception …

Add v2_a11 urgency_event_named shadow axis to filter_score.js

19 May 2026proposed

filter rejected AXIS reversible: simple 5h

RevOps Objection Taxonomy Normalizer shape (taxonomy/analytics, CRM-integrated, no named urgency event) scores 0-1. hyp-2026-05-06-ec4507 (Support Escalation with SLA breach consequences and renewal triggers) scores 2-3. Retrospective application to 43 S157-scored candidates show…

Log non-convergent council transcripts to data/non_convergent/ JSONL

19 May 2026proposed

filter rejected TOOL reversible: simple 3h

After 30 days of active council cycles (current rate ~9/week, empirical non-convergence rate 22%), directory accumulates 8-12 non-convergent transcripts. This corpus enables first structured analysis of split-reason taxonomy: whether splits cluster by hypothesis type, ICP, or mod…

Add observed-buyer pre-check at start of argument.js to skip no-market debates

18 May 2026proposed

deferred GATE reversible: simple 5h

Of 9 recent deep_council_verdict runs, at least 3 of the 5 killed hypotheses would be caught here: d3786b (Agronomy Advisory — no observable paying buyer for AI-powered agronomy ledgers), c27754 (Medical-Device SME buying AI components — buyer leverage unverified), cc72cd (Bot-Pr…

Add exception_classes field to genesis.js proposal output schema

18 May 2026proposed

deferred PROMPT reversible: simple 3h

After 2 genesis cycles (~62 new hypotheses at current 31/week throughput), at least 55% of generated proposals include a non-empty exception_classes field with ≥2 distinct named situations (not paraphrases). The companion exception_class_named axis (Proposal 1) will show mean sco…

Add exception_class_named as 6th v1 axis in filter_score.js

18 May 2026proposed

deferred AXIS reversible: simple 5h

Applied retroactively to 43 S157-graduated candidates: hyp-2026-04-19-3656a0 (ec4507, cited explicit scope exclusions) scores 2-3; hyp-2026-05-06-847f7e (zero exception classes anywhere) scores 0; at least 38 of 43 score 0-1, producing a minimum 2-point composite spread between R…