Methodology

How a hypothesis moves through the engine, from admission to verdict.

Pipeline

Every hypothesis passes through five phases. Each phase has stop conditions and explicit kill criteria.

1. Genesis

An LLM proposer, working from a corpus of seed concepts, drafts a candidate hypothesis. A second LLM critic reviews it for structural defects (banned audience patterns, seller-side incentive misalignment, undefined buyer, vague resolution criteria). Most candidates are rejected at this stage. Survivors are admitted to the board.

2. Argument

Three structured analyses run on the admitted hypothesis: steelman (strongest case for the product), red team kill (strongest case against), audience simulation (does a representative buyer want this).

3. Evidence hunt

An agentic web search hunts for three signal types: primary sources (regulator filings, peer-reviewed research, government data), competitor products with documented gaps, and demand proxies (forum discussions, GitHub issues, news). The hypothesis must collect at least 2 independent signals to advance.

4. Filter scoring

Two LLM advocates argue for and against on each of five filter axes. Three full runs. The median of each axis is summed for a composite score. Graduation bar: composite ≥ 9.0 / 15.

5. Verdict

A council of three frontier models reviews the full dossier and reaches a verdict (escalate / graduate / kill / need more signal). When the council cannot converge after three rounds, it escalates to Commander review.

The five filter axes

Axis	Question it answers	Why it matters
data moat	Does this product accumulate proprietary data?	Without a data moat, a better model resets the playing field every release.
10x model test	Does a better model make this MORE valuable?	If yes, you are on a defensible layer. If no, you are middleware in a price war.
fast feedback loops	Outputs verifiable against reality in <30 days?	Without fast grading, you cannot tell if you are improving — or wrong.
solo founder feasible	Buildable by one person without a team?	Reduces capital, time-to-product, and key-person risk.
AI providers cant eat it	Do hyperscalers have reason NOT to build this?	If a hyperscaler can absorb your wedge in a roadmap update, you do not have one.

What this engine does NOT do

Predict markets, prices, or specific outcomes — this is a product-evaluation system, not a forecasting one.
Build the products it evaluates — graduated hypotheses go to a Commander commitment decision, not an automated build.
Replace human judgment — Commander overrides are a first-class part of the pipeline, not a fallback.