Coding agents
AI coding loops
AI coding loops are repo-safe agent cycles for repeated coding work: inspect, patch, test, review, and stop with evidence.

Answer-first definition
An AI coding loop gives a coding agent durable task state, an isolated workspace when needed, allowed actions, a verifier such as tests or CI, and an approval boundary before merge or production change.
Common coding loop types
| Loop type | Verifier | Safe output |
|---|---|---|
| CI Failure Sweeper | CI job passes or failing log is summarized | Patch or report |
| PR Babysitter | PR checks and reviewer comments are resolved | Updated PR, not auto-merge |
| Flaky Test Stabilizer | Repeated test runs prove stability | Patch plus evidence log |
| Maker-Checker Review | Separate agent or human reviews diff against rubric | Review notes or gated approval |
| Dependency Update Loop | Tests, lockfile diff, vulnerability status | Approval-gated PR |
DetailsRelevant loops
| Loop | Category | Difficulty | Cadence | Verification |
|---|---|---|---|---|
| API Contract Drift | Engineering | Intermediate | After API changes or SDK releases | Server behavior, client types, examples, and docs agree on request/response contracts. |
| Acceptance Scenario Lockstep | Engineering | Intermediate | Before and during ambiguous feature work | The same scenarios written before implementation pass after the change, and any scope expansion is explicitly approved. |
| Agent Instructions After-Action | Engineering | Beginner | After a successful or painful agent coding session | Repo instructions contain only reusable, source-grounded lessons and the next similar task can start without rediscovering the same trap. |
| Architecture Rubric Refactor | Engineering | Advanced | When architecture work has a defined scope | Scoped module meets the written rubric, tests pass, and unresolved objections are explicit. |
| Behavior Ladder TDD | Engineering | Intermediate | When implementing logic-heavy features | Each behavior test fails before implementation, passes after the smallest change, and remains green through final refactor. |
| CI Optimization | Engineering | Advanced | Monthly or when CI is painful | CI p50/p95 improves against the same workflow without weakening tests or hiding failures. |
| Claude Code Repo Readiness | Engineering | Beginner | Before major agent work | Repo has agent instructions, documented commands, architecture notes, risk areas, and a docs/loops scaffold. |
| Cold Load Trim | Engineering | Advanced | When first visit feels heavy | Initial screen downloads fewer bytes while screenshots and behavior remain unchanged. |
| Completion Promise Loop | Engineering | Intermediate | For scoped implementation tasks where half-finished output is the main risk | Every acceptance criterion is checked with tests, browser evidence, logs, screenshots, or a clear blocker report before the agent stops. |
| Fresh Clone Onboarding | Engineering | Intermediate | Before onboarding | A clean machine reaches the documented ready state using only the README. |
| Parallel Agent Worktree Sweep | Engineering | Advanced | When several independent repo improvements can run at once | Each agent branch has isolated scope, passing checks, a summary, and no conflicting files before integration review. |
| Project Docs Freshness | Engineering | Beginner | Nightly or after meaningful code changes | Changed behavior, APIs, CLI commands, config, and workflows are reflected in docs. Docs checks pass. |
| Reference Oracle Implementation | Engineering | Advanced | Before implementing tricky behavior with an external source of truth | Generated outputs match the reference oracle across the agreed fixture set, with tolerances documented for legitimate differences. |
| Spec to Task Shards | Engineering | Intermediate | Before multi-file feature work or migrations | A written spec, non-goals, acceptance checks, and ordered task shards exist before implementation begins. |
| Test Flake Stabilizer | Engineering | Intermediate | When tests are inconsistent | The repaired test and full suite pass for the required consecutive-run streak. |
| Test and Logging Coverage | Engineering | Intermediate | Weekly or before release | Critical flows have useful tests and structured logs for representative success and failure paths. |
| Trace-First Debugging | Engineering | Intermediate | When an agent is tempted to patch a bug from a hunch | The bug is reproduced, the root cause is evidenced by traces or tests, and the fix includes a regression check. |
| Adversarial PR Review | Evaluation | Advanced | For meaningful PRs | An independent critic approves the unchanged version or only accepted findings remain. |
| Agent Merge Queue Review | Evaluation | Advanced | After multiple agent-generated PRs or branches accumulate | Only branches with passing checks, clear intent, non-conflicting scope, and human-readable evidence are merged or promoted. |
| Browser Quality Streak | Evaluation | Intermediate | Before release | N realistic scenarios pass consecutively, and earlier failures have regression coverage. |
| Dependency CVE Burndown | Security | Advanced | After security scan | No exploitable high or critical CVE remains without an explicit risk decision. |
| Sandboxed YOLO Probe | Security | Advanced | Before allowing autonomous shell-heavy agent runs | The agent can run needed commands inside the sandbox, cannot reach forbidden files/secrets, and produces a replayable diff or report before host-side changes. |