Verification

Verification loops

A verification loop does not just generate more output. It checks evidence, repairs, and stops only when the evidence passes or the approval boundary is reached.

Abstract maker-checker verification loop with magnifier and evidence signals

Verification types

VerifierWhat it proves
TestsBehavior still works after the change.
Source checksClaims match source material.
Diff reviewThe changed files are scoped and intentional.
Browser checksThe user-visible flow works in a real page.
ScreenshotsVisual output matches the intended state.
Human approvalThe loop has hit a public, reputational, legal, financial, or account boundary.

Deterministic vs judgment loops

Loop typeGood forRisk
Deterministic loopTests, page speed, lint, broken links, source checksSafest because the verifier is external and measurable.
LLM-as-judge loopArchitecture cleanup, docs quality, product reviewUseful but squishier; needs a narrow rubric and budget.
Human-gated loopPublishing, sending, deleting, production changesSlower, but safest for reputation, accounts, money, and legal risk.
Details

Relevant loops

LoopCategoryDifficultyCadenceVerification
Pre-Publish Source CheckContentIntermediateBefore publishing factual workEvery checkable claim is supported by a current source or visibly flagged for an editor.
Social Source to InsightContentIntermediateAfter saving a high-signal sourceSource captured, takeaways extracted, draft angles written, and no public post is published without approval.
Accessibility RepairDesignAdvancedBefore launch or when audits failNo confirmed accessibility blocker remains in the agreed pages, components, or tasks.
Error Message RewriteDesignIntermediateWhen users hit confusing errorsEvery in-scope user-visible error is accounted for, rewritten or blocked, and verified in a reachable state.
Architecture Rubric RefactorEngineeringAdvancedWhen architecture work has a defined scopeScoped module meets the written rubric, tests pass, and unresolved objections are explicit.
CI OptimizationEngineeringAdvancedMonthly or when CI is painfulCI p50/p95 improves against the same workflow without weakening tests or hiding failures.
Claude Code Repo ReadinessEngineeringBeginnerBefore major agent workRepo has agent instructions, documented commands, architecture notes, risk areas, and a docs/loops scaffold.
Cold Load TrimEngineeringAdvancedWhen first visit feels heavyInitial screen downloads fewer bytes while screenshots and behavior remain unchanged.
Fresh Clone OnboardingEngineeringIntermediateBefore onboardingA clean machine reaches the documented ready state using only the README.
Project Docs FreshnessEngineeringBeginnerNightly or after meaningful code changesChanged behavior, APIs, CLI commands, config, and workflows are reflected in docs. Docs checks pass.
Test Flake StabilizerEngineeringIntermediateWhen tests are inconsistentThe repaired test and full suite pass for the required consecutive-run streak.
Test and Logging CoverageEngineeringIntermediateWeekly or before releaseCritical flows have useful tests and structured logs for representative success and failure paths.
Adversarial PR ReviewEvaluationAdvancedFor meaningful PRsAn independent critic approves the unchanged version or only accepted findings remain.
Browser Quality StreakEvaluationIntermediateBefore releaseN realistic scenarios pass consecutively, and earlier failures have regression coverage.
Open Loop and Stale Memory CleanupKnowledgeBeginnerWeeklyNo current open loop is contradicted by recent daily or project notes.
Research to ArtifactKnowledgeIntermediateWhenever research must support a decisionThe artifact meets acceptance criteria, traces important claims to sources, and states uncertainty plainly.
Source Library Ingestion QAKnowledgeIntermediateAfter each source captureMetadata complete, transcript/article state honest, useful takeaways present, and qmd retrieval verified or refreshed.
Living StoryOperationsIntermediateWeekly or per project windowEvery prior thread is carried forward, closed with evidence, or flagged stale/needs-review.
Production Error SweepOperationsAdvancedDaily or after incidentActionable errors are fixed with reproduction or tests, or explicitly classified as noise.
Dependency CVE BurndownSecurityAdvancedAfter security scanNo exploitable high or critical CVE remains without an explicit risk decision.