Evaluation

Adversarial PR Review

Separate builder and critic so “looks good to me” means something.

Use when

One model built the change and should not grade its own homework.

Cadence

For meaningful PRs

Verification

An independent critic approves the unchanged version or only accepted findings remain.

Advanced spec

Structured loop spec

FieldValue
NameAdversarial PR Review
CategoryEvaluation
TriggerFor meaningful PRs
ObjectiveSeparate builder and critic so “looks good to me” means something.
Allowed inputsRelevant files, source notes, logs, tests, screenshots, metrics, or task state for this loop
Allowed actionsDefine the exact scope, source of truth, and approval boundary.; Inspect current state and rank the highest-risk gap.; Make one small, reversible improvement.; Run the stated verification and record evidence.; Stop on success, budget, no progress, or approval required.
VerificationAn independent critic approves the unchanged version or only accepted findings remain.
Stop conditionStop when the verifier passes, the budget is exhausted, no progress is made, a blocker appears, or approval is required.
BudgetSet a time, turn, token, retry, file, or dollar cap before running the loop.
Approval boundaryHuman approval required before publishing, sending, deleting, spending, changing accounts, touching production, or making reputational/legal/financial commitments.
Safe outputDraft, report, checklist, table, or approval-gated recommendation
Works withClaude, ChatGPT, Gemini, any tool-using AI assistant
Runbook

Steps

  1. Define the exact scope, source of truth, and approval boundary.
  2. Inspect current state and rank the highest-risk gap.
  3. Make one small, reversible improvement.
  4. Run the stated verification and record evidence.
  5. Stop on success, budget, no progress, or approval required.
Copy prompt

Prompt

Run the Adversarial PR Review loop. Use it when One model built the change and should not grade its own homework. Work in bounded iterations: inspect current state, choose the highest-risk gap, make one reversible improvement, verify it, and record evidence. Stop when An independent critic approves the unchanged version or only accepted findings remain. or when blocked, budget exhausted, or approval is required.
Metadata

Tags

code reviewmulti-agentPR
Next loops

Related