Evaluation
Adversarial PR Review
Separate builder and critic so “looks good to me” means something.
Use when
One model built the change and should not grade its own homework.
Cadence
For meaningful PRs
Verification
An independent critic approves the unchanged version or only accepted findings remain.
Advanced specStructured loop spec
| Field | Value |
|---|---|
| Name | Adversarial PR Review |
| Category | Evaluation |
| Trigger | For meaningful PRs |
| Objective | Separate builder and critic so “looks good to me” means something. |
| Allowed inputs | Relevant files, source notes, logs, tests, screenshots, metrics, or task state for this loop |
| Allowed actions | Define the exact scope, source of truth, and approval boundary.; Inspect current state and rank the highest-risk gap.; Make one small, reversible improvement.; Run the stated verification and record evidence.; Stop on success, budget, no progress, or approval required. |
| Verification | An independent critic approves the unchanged version or only accepted findings remain. |
| Stop condition | Stop when the verifier passes, the budget is exhausted, no progress is made, a blocker appears, or approval is required. |
| Budget | Set a time, turn, token, retry, file, or dollar cap before running the loop. |
| Approval boundary | Human approval required before publishing, sending, deleting, spending, changing accounts, touching production, or making reputational/legal/financial commitments. |
| Safe output | Draft, report, checklist, table, or approval-gated recommendation |
| Works with | Claude, ChatGPT, Gemini, any tool-using AI assistant |
RunbookSteps
- Define the exact scope, source of truth, and approval boundary.
- Inspect current state and rank the highest-risk gap.
- Make one small, reversible improvement.
- Run the stated verification and record evidence.
- Stop on success, budget, no progress, or approval required.
Copy promptPrompt
Run the Adversarial PR Review loop. Use it when One model built the change and should not grade its own homework. Work in bounded iterations: inspect current state, choose the highest-risk gap, make one reversible improvement, verify it, and record evidence. Stop when An independent critic approves the unchanged version or only accepted findings remain. or when blocked, budget exhausted, or approval is required.