← all hypotheses

Pre-Hire Confidence Calibration for Heads of Talent on Senior IC Misfires

ranked [TRIANGULATED] filter 8.5/15 spread ±1.0 signals: 3 independent
What is this?
A pre-hire confidence-grading service for heads of talent at UK/US 50-200 person product, engineering, and boutique consulting firms hiring senior ICs into £80-150k judgment-heavy roles. Before the offer is sent, the head of talent enters a confidence score and reasoning bullets predicting whether the finalist will survive 90 days without a PIP, formal decision-reversal, or exit. AE's adversarial multi-model debate generates a structured pre-offer challenge pack the firm runs through its existing interview loop; the HoT records what came back. At 90 days, AE grades pre-hire confidence against three objective HRIS-recorded events the HoT already pulls — PIP status, exit status, formally-documented decision reversal — never against subjective manager opinions. AE compounds this into a per-firm calibration profile: which pre-hire signals (challenge-pack performance, reasoning-bullet patterns) predict which objective 90-day outcomes for THIS firm. Hiring managers enter nothing; HoTs report on themselves using records that exist independently of AE.
Why did we consider it?
AE's reality-graded prediction engine maps 1:1 onto a publicly-acknowledged confidence gap in senior IC hiring, with HRIS-objective grading, single-buyer workflow, and pricing math that fits the solo founder's £100-300K ARR target.
What breaks?
  • 11th-hour candidate friction: Adding a challenge pack at the finalist stage will cause £80-150k candidates to abandon the pipeline.
  • Misaligned HoT incentives: Talent leaders are KPI'd on time-to-fill and acceptance rates, not 90-day retention (which is blamed on Hiring Managers).
  • Sparse data volume: 50-200 person firms do not hire enough senior ICs annually to generate a statistically significant calibration profile.
What did we learn?
Still in evaluation (phase: ranked). No verdict yet.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

AxisWhat it measures
data moatDoes this product accumulate proprietary data that compounds?
10x model testDoes a better model make this more valuable, or redundant?
fast feedback loopsCan outputs be graded against reality in <30 days?
solo founder feasibleCan a solo operator build and run this without a team?
AI providers cant eat itDo hyperscalers have structural reasons NOT to build this?
Composite median: 8.5 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.

Evidence

Signal A — Primary source

We find that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities.

Signal B — Competitor with documented gap

HackerEarth offers data-driven recruiting analytics (collecting and applying quantitative insights from talent data) but focuses on pre-hire assessment and screening metrics — no pre-offer confidence grading by the HoT, no adversarial challenge-pack generation, and no closed-loop calibration against 90-day HRIS-recorded outcomes (PIP, exit, decision-reversal).

Signal D — Demand proxy

{"found":true,"summary":"LinkedIn and HN discussions surface the exact pain points: recruiters confuse confidence with competence in senior hires, calibration failures in recruitment are recognized as systemic leadership problems, and managers hire defensively rather than for capability — all indicating latent demand for structured pre-hire confidence accountability.","sources":["https://www.linkedin.com/posts/tadthornton_the-infamous-false-calibration-activity-7452051459182944256-PkXI","https://www.linkedin.com/posts/patrick-wicker_the-biggest-hiring-mistake-leaders-dont-activity-742635188576…

Evaluation history

WhenStagePhase
2026-05-10 00:12filter_scorescored
2026-05-10 00:06filter_scorescored
2026-05-09 23:54filter_scorescored
2026-05-09 23:49evidence_searchevidence_hunt
2026-05-09 23:42evidence_searchevidence_hunt
2026-05-09 23:36evidence_searchevidence_hunt
2026-05-09 23:25evidence_searchargument
2026-05-09 23:18audience_simulationargument
2026-05-09 23:12red_team_killargument
2026-05-09 23:06steelmanargument
2026-05-09 23:02genesisargument