Methodology
How a hypothesis moves through the engine, from admission to verdict.
Pipeline
Every hypothesis passes through five phases. Each phase has stop conditions and explicit kill criteria.
1. Genesis
An LLM proposer, working from a corpus of seed concepts, drafts a candidate hypothesis. A second LLM critic reviews it for structural defects (banned audience patterns, seller-side incentive misalignment, undefined buyer, vague resolution criteria). Most candidates are rejected at this stage. Survivors are admitted to the board.
2. Argument
Three structured analyses run on the admitted hypothesis: steelman (strongest case for the product), red team kill (strongest case against), audience simulation (does a representative buyer want this).
3. Evidence hunt
An agentic web search hunts for three signal types: primary sources (regulator filings, peer-reviewed research, government data), competitor products with documented gaps, and demand proxies (forum discussions, GitHub issues, news). The hypothesis must collect at least 2 independent signals to advance.
4. Filter scoring
Two LLM advocates argue for and against on each of five filter axes. Three full runs. The median of each axis is summed for a composite score. Graduation bar: composite ≥ 9.0 / 15.
5. Verdict
A council of three frontier models reviews the full dossier and reaches a verdict (escalate / graduate / kill / need more signal). When the council cannot converge after three rounds, it escalates to Commander review.
The five filter axes
| Axis | Question it answers | Why it matters |
|---|
| data moat | Does this product accumulate proprietary data? | Without a data moat, a better model resets the playing field every release. |
| 10x model test | Does a better model make this MORE valuable? | If yes, you are on a defensible layer. If no, you are middleware in a price war. |
| fast feedback loops | Outputs verifiable against reality in <30 days? | Without fast grading, you cannot tell if you are improving — or wrong. |
| solo founder feasible | Buildable by one person without a team? | Reduces capital, time-to-product, and key-person risk. |
| AI providers cant eat it | Do hyperscalers have reason NOT to build this? | If a hyperscaler can absorb your wedge in a roadmap update, you do not have one. |
What this engine does NOT do
- Predict markets, prices, or specific outcomes — this is a product-evaluation system, not a forecasting one.
- Build the products it evaluates — graduated hypotheses go to a Commander commitment decision, not an automated build.
- Replace human judgment — Commander overrides are a first-class part of the pipeline, not a fallback.