A self-grading hypothesis engine

Abstract Essence evaluates product opportunities the way it would evaluate any other claim: adversarial multi-model debate, structured filters, and a public verdict for every candidate. This dashboard is the live record.

What this is

What is it?
An autonomous engine that proposes, argues, evidences, and scores product hypotheses. Multiple frontier LLMs debate each candidate. A five-axis filter scores each one across three independent runs. Survivors graduate; failures are killed with documented reasoning.
Why this approach?
Anyone can generate convincing-looking product ideas. The harder question is which ones survive structured scrutiny. The engine answers that question on the record, with the reasoning visible.
What breaks?
The engine grades its own filter coverage. New failure modes (rubric blindspots, structural mismatches, distribution-shape mistakes) are surfaced via Commander overrides and patched into the next prompt revision. Every override is logged.
What we have learned so far
Buyer-side products consistently outperform seller-side ones when the seller monetises conviction. Structural fit (workflow shape, build complexity) matters more than scoring fit. Filter scores above the graduation bar are necessary but not sufficient — Commander review still kills a meaningful share of graduated candidates.

Featured candidate

Engine state

74
total hypotheses
35
graduated
20
in flight
37
killed / exhausted
1,539
moves logged
$267
engine spend lifetime

See all hypotheses →