← all hypotheses

Commitment Audit for Boutique Software Agencies

graduated [B] filter 10.0/15 spread ±1.0 signals: 2 independent
What is this?
A pre-send commitment audit for boutique software agencies that reviews proposals, SOWs, and estimates for unsupported promises before they reach the client. Instead of predicting eventual project profitability, the product identifies immediate, objective failure modes in the artifact itself: conclusions not supported by stated scope, missing exclusions, contradictory assumptions, timeline claims without dependency coverage, and confidence language that outruns the evidence provided. The agency submits the outbound document plus a compact constraint sheet covering team shape, delivery model, known unknowns, excluded work, and acceptable certainty range. AE runs adversarial debate over the commitments, then produces a structured red-team report with required revisions, explicit assumptions, and downgrade/kill recommendations for claims that cannot be defended. The grading loop is based on fast artifact-level outcomes, not months-later delivery results: whether flags were accepted, whether the proposal was revised, whether unsupported claims were removed, and whether client clarification requests matched the flagged gaps. This makes the product a defensible proposal-risk gate, not a project-margin oracle.
Why did we consider it?
A Commitment Audit is compelling because it solves an immediate, high-cost, pre-sale failure mode for boutique agencies with fast, objective artifact-level feedback rather than vague long-horizon project predictions.
What breaks?
  • Incentive misalignment: Agencies overpromise to win competitive pitches; a tool demanding realistic constraints acts as a 'sales prevention' bottleneck.
  • The 'Shelfware Audit' trap: Real-world evidence shows analytical audits are routinely ignored if they require painful implementation or risk immediate revenue.
  • Self-defeating feedback loop: The fast grading mechanism (tracking if flags were accepted) will merely document the agency's refusal to downgrade their sales claims.
What did we learn?
Engine verdict: GATHER_MORE_SIGNAL (WORTH_SKIMMING). Real pain and whitespace, but no proof agencies will pay to remove the ambiguity that helps them win deals.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

AxisWhat it measures
data moatDoes this product accumulate proprietary data that compounds?
10x model testDoes a better model make this more valuable, or redundant?
fast feedback loopsCan outputs be graded against reality in <30 days?
solo founder feasibleCan a solo operator build and run this without a team?
AI providers cant eat itDo hyperscalers have structural reasons NOT to build this?
Composite median: 10.0 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.

Evidence

Signal B — Competitor with documented gap

Vern is positioned as AI contract review and clause risk analysis for signed/negotiated contracts, not as a pre-send proposal/SOW/estimate commitment audit for software agencies that checks unsupported delivery promises, scope-evidence mismatches, missing exclusions, contradictory assumptions, or timeline claims without dependency coverage.

Signal D — Demand proxy

{"summary":"Indirect evidence exists that freelancers/agencies experience scope creep, scattered scope changes, and client-project friction tied to unclear commitments and documentation; there are also public SOW examples/templates indicating ongoing interest in SOW structure.","sources":["https://www.reddit.com/r/FreelanceProgramming/comments/1r7cmzl/how_do_you_handle_scope_creep_and_late_payments.json","https://www.reddit.com/r/webdev/comments/1htqvcs/i_just_had_the_worst_experience_with_a_client_it/","https://github.com/joelparkerhenderson/statement-of-work/blob/main/README.md","https://git…

Evaluation history

WhenStagePhase
2026-04-19 18:32deep_council_verdictgraduated
2026-04-19 18:24deep_claude_takegraduated
2026-04-19 18:21deep_90day_plangraduated
2026-04-19 18:11deep_riskgraduated
2026-04-19 18:02deep_distributiongraduated
2026-04-19 17:54deep_pricinggraduated
2026-04-19 17:43deep_moatgraduated
2026-04-19 17:37deep_buyer_simgraduated
2026-04-19 17:30deep_icpgraduated
2026-04-19 17:17deep_competitorgraduated
2026-04-19 17:07deep_market_realitygraduated
2026-04-19 16:50filter_scorescored
2026-04-19 16:40filter_scorescored
2026-04-19 16:30filter_scorescored
2026-04-19 16:20evidence_searchargument
2026-04-19 16:10audience_simulationargument
2026-04-19 16:00red_team_killargument
2026-04-19 15:50steelmanargument
2026-04-19 15:40genesisargument