Engineering

Test Flake Stabilizer

Find the real cause of flakes instead of wallpapering them with sleeps.

Use when

CI fails differently across comparable runs.

Cadence

When tests are inconsistent

Verification

The repaired test and full suite pass for the required consecutive-run streak.

Advanced spec

Structured loop spec

FieldValue
NameTest Flake Stabilizer
CategoryEngineering
TriggerWhen tests are inconsistent
ObjectiveFind the real cause of flakes instead of wallpapering them with sleeps.
Allowed inputsRelevant files, source notes, logs, tests, screenshots, metrics, or task state for this loop
Allowed actionsDefine the exact scope, source of truth, and approval boundary.; Inspect current state and rank the highest-risk gap.; Make one small, reversible improvement.; Run the stated verification and record evidence.; Stop on success, budget, no progress, or approval required.
VerificationThe repaired test and full suite pass for the required consecutive-run streak.
Stop conditionStop when the verifier passes, the budget is exhausted, no progress is made, a blocker appears, or approval is required.
BudgetSet a time, turn, token, retry, file, or dollar cap before running the loop.
Approval boundaryHuman approval required before publishing, sending, deleting, spending, changing accounts, touching production, or making reputational/legal/financial commitments.
Safe outputPull request, patch, report, or evidence log
Works withClaude Code, OpenAI Codex, Cursor, Gemini CLI, any tool-using coding agent
Runbook

Steps

  1. Define the exact scope, source of truth, and approval boundary.
  2. Inspect current state and rank the highest-risk gap.
  3. Make one small, reversible improvement.
  4. Run the stated verification and record evidence.
  5. Stop on success, budget, no progress, or approval required.
Copy prompt

Prompt

Run the Test Flake Stabilizer loop. Use it when CI fails differently across comparable runs. Work in bounded iterations: inspect current state, choose the highest-risk gap, make one reversible improvement, verify it, and record evidence. Stop when The repaired test and full suite pass for the required consecutive-run streak. or when blocked, budget exhausted, or approval is required.
Metadata

Tags

testingflakesCI
Next loops

Related