Engineering
Reference Oracle Implementation
Give the agent an oracle, not a pep talk.
Use when
You need a coding agent to implement complex behavior that can be checked against a browser, official library, legacy system, API, compiler, or production output.
Cadence
Before implementing tricky behavior with an external source of truth
Verification
Generated outputs match the reference oracle across the agreed fixture set, with tolerances documented for legitimate differences.
Advanced specStructured loop spec
| Field | Value |
|---|---|
| Name | Reference Oracle Implementation |
| Category | Engineering |
| Trigger | Before implementing tricky behavior with an external source of truth |
| Objective | Give the agent an oracle, not a pep talk. |
| Allowed inputs | Relevant files, source notes, logs, tests, screenshots, metrics, or task state for this loop |
| Allowed actions | Identify the most trustworthy reference output: browser engine, legacy implementation, official API, golden files, or production traces.; Build a small command or fixture harness that compares the new implementation against that oracle.; Start with the simplest passing cases, then add one behavior class at a time.; After each change, run the parity harness and record failures with inputs, expected output, actual output, and tolerance rules.; Stop when the scoped fixture set passes or the remaining differences require product or standards judgment. |
| Verification | Generated outputs match the reference oracle across the agreed fixture set, with tolerances documented for legitimate differences. |
| Stop condition | Stop when the verifier passes, the budget is exhausted, no progress is made, a blocker appears, or approval is required. |
| Budget | Set a time, turn, token, retry, file, or dollar cap before running the loop. |
| Approval boundary | Human approval required before publishing, sending, deleting, spending, changing accounts, touching production, or making reputational/legal/financial commitments. |
| Safe output | Pull request, patch, report, or evidence log |
| Works with | Claude Code, OpenAI Codex, Cursor, Gemini CLI, any tool-using coding agent |
RunbookSteps
- Identify the most trustworthy reference output: browser engine, legacy implementation, official API, golden files, or production traces.
- Build a small command or fixture harness that compares the new implementation against that oracle.
- Start with the simplest passing cases, then add one behavior class at a time.
- After each change, run the parity harness and record failures with inputs, expected output, actual output, and tolerance rules.
- Stop when the scoped fixture set passes or the remaining differences require product or standards judgment.
Copy promptPrompt
Run the Reference Oracle Implementation loop. First identify the reference oracle for the behavior: browser engine, legacy system, official library, API response, compiler output, or golden files. Build a repeatable parity harness before implementation. Add one behavior class at a time, compare expected vs actual output, document any tolerance rules, and stop when the scoped fixture set passes or remaining differences require human judgment.