Engineering

Reference Oracle Implementation

Give the agent an oracle, not a pep talk.

Use when

You need a coding agent to implement complex behavior that can be checked against a browser, official library, legacy system, API, compiler, or production output.

Cadence

Before implementing tricky behavior with an external source of truth

Verification

Generated outputs match the reference oracle across the agreed fixture set, with tolerances documented for legitimate differences.

Advanced spec

Structured loop spec

FieldValue
NameReference Oracle Implementation
CategoryEngineering
TriggerBefore implementing tricky behavior with an external source of truth
ObjectiveGive the agent an oracle, not a pep talk.
Allowed inputsRelevant files, source notes, logs, tests, screenshots, metrics, or task state for this loop
Allowed actionsIdentify the most trustworthy reference output: browser engine, legacy implementation, official API, golden files, or production traces.; Build a small command or fixture harness that compares the new implementation against that oracle.; Start with the simplest passing cases, then add one behavior class at a time.; After each change, run the parity harness and record failures with inputs, expected output, actual output, and tolerance rules.; Stop when the scoped fixture set passes or the remaining differences require product or standards judgment.
VerificationGenerated outputs match the reference oracle across the agreed fixture set, with tolerances documented for legitimate differences.
Stop conditionStop when the verifier passes, the budget is exhausted, no progress is made, a blocker appears, or approval is required.
BudgetSet a time, turn, token, retry, file, or dollar cap before running the loop.
Approval boundaryHuman approval required before publishing, sending, deleting, spending, changing accounts, touching production, or making reputational/legal/financial commitments.
Safe outputPull request, patch, report, or evidence log
Works withClaude Code, OpenAI Codex, Cursor, Gemini CLI, any tool-using coding agent
Runbook

Steps

  1. Identify the most trustworthy reference output: browser engine, legacy implementation, official API, golden files, or production traces.
  2. Build a small command or fixture harness that compares the new implementation against that oracle.
  3. Start with the simplest passing cases, then add one behavior class at a time.
  4. After each change, run the parity harness and record failures with inputs, expected output, actual output, and tolerance rules.
  5. Stop when the scoped fixture set passes or the remaining differences require product or standards judgment.
Copy prompt

Prompt

Run the Reference Oracle Implementation loop. First identify the reference oracle for the behavior: browser engine, legacy system, official library, API response, compiler output, or golden files. Build a repeatable parity harness before implementation. Add one behavior class at a time, compare expected vs actual output, document any tolerance rules, and stop when the scoped fixture set passes or remaining differences require human judgment.
Metadata

Tags

reference implementationtestingparityagent coding
Next loops

Related