← all meta proposals

Add p3_evidence_marker_validator tool

filter rejected TOOL reversible: simple 3h proposed 19 Jun 2026
What is the proposed change?
New standalone validator that takes a P3 rubric output and checks for one of the literal tokens 'EVIDENCE:(a)', 'EVIDENCE:(b)', or 'EVIDENCE:none' in each scored sub-criterion. If any sub-criterion is missing the marker, validator returns {ok: false, missing: [...]}. Wire into p3_rubric.js post-LLM step: if !ok, single retry with explicit reminder; if still !ok, downgrade the P3 axis score by 1 and emit telemetry 'p3_evidence_marker_missing'. Do NOT block graduation on this — it is a quality signal, not a gate.
Target files
hypothesis_engine/tools/p3_evidence_validator.js hypothesis_engine/moves/p3_rubric.js
Expected effect
P3 outputs gain >=95% evidence-marker compliance within a week (currently anecdotal; no measurement). Telemetry surfaces which corpora/templates systematically produce evidence-less rubric outputs.
Falsifier — what would prove this wrong?
If post-deploy 'p3_evidence_marker_missing' rate stays >20% after 100 evaluations despite the single-retry prompt, the rubric move prompt itself is the bug, not the validator — escalate to PROMPT change. If rate is 0% from day one, the validator is matching too loosely (regex too permissive); tighten.
Evidence that triggered the proposal
  • D — brain/P3_solo_founder_feasible.md — EVIDENCE:(a)/(b)/none markers defined but not enforced
  • E — p3_rubric move outputs (sampled) — marker compliance inconsistent across corpora

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier3
solo feasible3
blast radius1
composability3
reversibility3
Disposition
Rejected by filter_score. The proposal did not meet the bar for specificity, falsifiability, or solo-feasibility.

Evaluation history

WhenMove
2026-06-19 04:05meta_filter_score
2026-06-19 04:04meta_genesis