Pre-Interview Probe Pack for In-House Recruiting Leads

ranked [TRIANGULATED] filter 8.5/15 spread ±0.5 signals: 2 independent

What is this?

A pre-interview tool used by heads of talent at 30-150 person founder-led SaaS who hire through external search firms or increasingly AI-polished sourcing services. When a recruiter sends through a 'this candidate is a 9/10 fit because X, Y, Z' rationale, the head pastes it in. AE's adversarial multi-model debate generates 5-8 behavioural probes engineered to falsify the rationale's strongest claims — not generic interview prompts. Probes drop into the interview-loop scorecard template. After interviews, the head selects scorecard verdicts; at 90 days, retention status. Over 3-6 hires per recruiter, the tool builds a per-recruiter rationale-vs-reality ledger: whose rationales survive probing, whose collapse. AE is uniquely suited because adversarial debate generates probes that try to break a claim rather than confirm it, and the code-enforced grading loop ties probe outcomes to ATS scorecard labels objectively (not LLM-as-judge). The ledger uses AE's lifecycle states to promote/demote/kill recruiters' rationale credibility over months.

Why did we consider it?

AE's adversarial debate plus objective lifecycle grading uniquely produces falsifying interview probes and a per-recruiter credibility ledger that prep-packet and ATS incumbents cannot replicate.

What breaks?

Feedback Loop Mismatch: The AE's core strength is sub-24h grading, but the hypothesis relies on 90-day retention metrics and multi-week interview loops, neutralizing the engine's speed.
Breaks Structured Interviewing: Generating bespoke adversarial probes per candidate destroys the standardized scorecard rubrics required for objective comparison and compliance.
Statistically Insignificant Volume: A 30-150 person SaaS does not hire enough volume through individual external recruiters to build a meaningful 'credibility ledger' based on 90-day retention.

What did we learn?

Still in evaluation (phase: ranked). No verdict yet.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

Axis	What it measures
data moat	Does this product accumulate proprietary data that compounds?
10x model test	Does a better model make this more valuable, or redundant?
fast feedback loops	Can outputs be graded against reality in <30 days?
solo founder feasible	Can a solo operator build and run this without a team?
AI providers cant eat it	Do hyperscalers have structural reasons NOT to build this?

Composite median: 8.5 / 15. Graduation threshold: 9.0. IQR across runs: 0.5.

Evidence

Signal B — Competitor with documented gap

https://www.lever.co/blog/pre-screening-interview-questions

Lever provides generic pre-screening interview questions to 'qualify candidates faster and reduce time-to-hire' but does not generate adversarial probes engineered to falsify a specific recruiter's candidate rationale, nor does it build a per-recruiter credibility ledger tracking rationale-vs-reality over multiple hires.

Signal D — Demand proxy

{"found":true,"summary":"HN and Reddit discussions confirm pain around recruiter misrepresentation and the inadequacy of generic behavioral interview questions, directly validating the core problem this hypothesis targets.","sources":["https://news.ycombinator.com/item?id=32996457","https://www.reddit.com/r/interviews/comments/1p5bnby/after_interviewing_tons_of_candidates_i_realized/"],"reason":"HN thread on 'hiring fraud' surfaces frustration with recruiter dishonesty about candidate fit — the exact pain point the probe pack addresses. Reddit thread confirms interviewers find generic behavior…

Evaluation history

When	Stage	Phase
2026-05-10 03:48	filter_score	scored
2026-05-10 03:42	filter_score	scored
2026-05-10 03:36	filter_score	scored
2026-05-10 03:30	evidence_search	argument
2026-05-10 03:24	audience_simulation	argument
2026-05-10 03:18	red_team_kill	argument
2026-05-10 03:12	steelman	argument
2026-05-10 03:08	genesis	argument