Engineering
Test Flake Stabilizer
Find the real cause of flakes instead of wallpapering them with sleeps.
Use when
CI fails differently across comparable runs.
Cadence
When tests are inconsistent
Verification
The repaired test and full suite pass for the required consecutive-run streak.
Advanced specStructured loop spec
| Field | Value |
|---|---|
| Name | Test Flake Stabilizer |
| Category | Engineering |
| Trigger | When tests are inconsistent |
| Objective | Find the real cause of flakes instead of wallpapering them with sleeps. |
| Allowed inputs | Relevant files, source notes, logs, tests, screenshots, metrics, or task state for this loop |
| Allowed actions | Define the exact scope, source of truth, and approval boundary.; Inspect current state and rank the highest-risk gap.; Make one small, reversible improvement.; Run the stated verification and record evidence.; Stop on success, budget, no progress, or approval required. |
| Verification | The repaired test and full suite pass for the required consecutive-run streak. |
| Stop condition | Stop when the verifier passes, the budget is exhausted, no progress is made, a blocker appears, or approval is required. |
| Budget | Set a time, turn, token, retry, file, or dollar cap before running the loop. |
| Approval boundary | Human approval required before publishing, sending, deleting, spending, changing accounts, touching production, or making reputational/legal/financial commitments. |
| Safe output | Pull request, patch, report, or evidence log |
| Works with | Claude Code, OpenAI Codex, Cursor, Gemini CLI, any tool-using coding agent |
RunbookSteps
- Define the exact scope, source of truth, and approval boundary.
- Inspect current state and rank the highest-risk gap.
- Make one small, reversible improvement.
- Run the stated verification and record evidence.
- Stop on success, budget, no progress, or approval required.
Copy promptPrompt
Run the Test Flake Stabilizer loop. Use it when CI fails differently across comparable runs. Work in bounded iterations: inspect current state, choose the highest-risk gap, make one reversible improvement, verify it, and record evidence. Stop when The repaired test and full suite pass for the required consecutive-run streak. or when blocked, budget exhausted, or approval is required.