Evaluation
Browser Quality Streak
Run real scenarios until the streak proves the product is stable.
Use when
You need confidence that the product works in realistic flows.
Cadence
Before release
Verification
N realistic scenarios pass consecutively, and earlier failures have regression coverage.
Advanced specStructured loop spec
| Field | Value |
|---|---|
| Name | Browser Quality Streak |
| Category | Evaluation |
| Trigger | Before release |
| Objective | Run real scenarios until the streak proves the product is stable. |
| Allowed inputs | Relevant files, source notes, logs, tests, screenshots, metrics, or task state for this loop |
| Allowed actions | Define the exact scope, source of truth, and approval boundary.; Inspect current state and rank the highest-risk gap.; Make one small, reversible improvement.; Run the stated verification and record evidence.; Stop on success, budget, no progress, or approval required. |
| Verification | N realistic scenarios pass consecutively, and earlier failures have regression coverage. |
| Stop condition | Stop when the verifier passes, the budget is exhausted, no progress is made, a blocker appears, or approval is required. |
| Budget | Set a time, turn, token, retry, file, or dollar cap before running the loop. |
| Approval boundary | Human approval required before publishing, sending, deleting, spending, changing accounts, touching production, or making reputational/legal/financial commitments. |
| Safe output | Draft, report, checklist, table, or approval-gated recommendation |
| Works with | Claude, ChatGPT, Gemini, any tool-using AI assistant |
RunbookSteps
- Define the exact scope, source of truth, and approval boundary.
- Inspect current state and rank the highest-risk gap.
- Make one small, reversible improvement.
- Run the stated verification and record evidence.
- Stop on success, budget, no progress, or approval required.
Copy promptPrompt
Run the Browser Quality Streak loop. Use it when You need confidence that the product works in realistic flows. Work in bounded iterations: inspect current state, choose the highest-risk gap, make one reversible improvement, verify it, and record evidence. Stop when N realistic scenarios pass consecutively, and earlier failures have regression coverage. or when blocked, budget exhausted, or approval is required.