Commitment Audit for Boutique Software Agencies

graduated [B] filter 10.0/15 spread ±1.0 signals: 2 independent

What is this?

A pre-send commitment audit for boutique software agencies that reviews proposals, SOWs, and estimates for unsupported promises before they reach the client. Instead of predicting eventual project profitability, the product identifies immediate, objective failure modes in the artifact itself: conclusions not supported by stated scope, missing exclusions, contradictory assumptions, timeline claims without dependency coverage, and confidence language that outruns the evidence provided. The agency submits the outbound document plus a compact constraint sheet covering team shape, delivery model, known unknowns, excluded work, and acceptable certainty range. AE runs adversarial debate over the commitments, then produces a structured red-team report with required revisions, explicit assumptions, and downgrade/kill recommendations for claims that cannot be defended. The grading loop is based on fast artifact-level outcomes, not months-later delivery results: whether flags were accepted, whether the proposal was revised, whether unsupported claims were removed, and whether client clarification requests matched the flagged gaps. This makes the product a defensible proposal-risk gate, not a project-margin oracle.

Why did we consider it?

A Commitment Audit is compelling because it solves an immediate, high-cost, pre-sale failure mode for boutique agencies with fast, objective artifact-level feedback rather than vague long-horizon project predictions.

What breaks?

Incentive misalignment: Agencies overpromise to win competitive pitches; a tool demanding realistic constraints acts as a 'sales prevention' bottleneck.
The 'Shelfware Audit' trap: Real-world evidence shows analytical audits are routinely ignored if they require painful implementation or risk immediate revenue.
Self-defeating feedback loop: The fast grading mechanism (tracking if flags were accepted) will merely document the agency's refusal to downgrade their sales claims.

What did we learn?

Engine verdict: GATHER_MORE_SIGNAL (WORTH_SKIMMING). Real pain and whitespace, but no proof agencies will pay to remove the ambiguity that helps them win deals.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

Axis	What it measures
data moat	Does this product accumulate proprietary data that compounds?
10x model test	Does a better model make this more valuable, or redundant?
fast feedback loops	Can outputs be graded against reality in <30 days?
solo founder feasible	Can a solo operator build and run this without a team?
AI providers cant eat it	Do hyperscalers have structural reasons NOT to build this?

Composite median: 10.0 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.

Evidence

Signal B — Competitor with documented gap

https://www.askvern.ai/

Vern is positioned as AI contract review and clause risk analysis for signed/negotiated contracts, not as a pre-send proposal/SOW/estimate commitment audit for software agencies that checks unsupported delivery promises, scope-evidence mismatches, missing exclusions, contradictory assumptions, or timeline claims without dependency coverage.

Signal D — Demand proxy

{"summary":"Indirect evidence exists that freelancers/agencies experience scope creep, scattered scope changes, and client-project friction tied to unclear commitments and documentation; there are also public SOW examples/templates indicating ongoing interest in SOW structure.","sources":["https://www.reddit.com/r/FreelanceProgramming/comments/1r7cmzl/how_do_you_handle_scope_creep_and_late_payments.json","https://www.reddit.com/r/webdev/comments/1htqvcs/i_just_had_the_worst_experience_with_a_client_it/","https://github.com/joelparkerhenderson/statement-of-work/blob/main/README.md","https://git…

Evaluation history

When	Stage	Phase
2026-04-19 18:32	deep_council_verdict	graduated
2026-04-19 18:24	deep_claude_take	graduated
2026-04-19 18:21	deep_90day_plan	graduated
2026-04-19 18:11	deep_risk	graduated
2026-04-19 18:02	deep_distribution	graduated
2026-04-19 17:54	deep_pricing	graduated
2026-04-19 17:43	deep_moat	graduated
2026-04-19 17:37	deep_buyer_sim	graduated
2026-04-19 17:30	deep_icp	graduated
2026-04-19 17:17	deep_competitor	graduated
2026-04-19 17:07	deep_market_reality	graduated
2026-04-19 16:50	filter_score	scored
2026-04-19 16:40	filter_score	scored
2026-04-19 16:30	filter_score	scored
2026-04-19 16:20	evidence_search	argument
2026-04-19 16:10	audience_simulation	argument
2026-04-19 16:00	red_team_kill	argument
2026-04-19 15:50	steelman	argument
2026-04-19 15:40	genesis	argument