Engineering

Reference Oracle Implementation

Give the agent an oracle, not a pep talk.

Use when

You need a coding agent to implement complex behavior that can be checked against a browser, official library, legacy system, API, compiler, or production output.

Cadence

Before implementing tricky behavior with an external source of truth

Verification

Generated outputs match the reference oracle across the agreed fixture set, with tolerances documented for legitimate differences.

Advanced spec

Structured loop spec

Field	Value
Name	Reference Oracle Implementation
Category	Engineering
Trigger	Before implementing tricky behavior with an external source of truth
Objective	Give the agent an oracle, not a pep talk.
Allowed inputs	Relevant files, source notes, logs, tests, screenshots, metrics, or task state for this loop
Allowed actions	Identify the most trustworthy reference output: browser engine, legacy implementation, official API, golden files, or production traces.; Build a small command or fixture harness that compares the new implementation against that oracle.; Start with the simplest passing cases, then add one behavior class at a time.; After each change, run the parity harness and record failures with inputs, expected output, actual output, and tolerance rules.; Stop when the scoped fixture set passes or the remaining differences require product or standards judgment.
Verification	Generated outputs match the reference oracle across the agreed fixture set, with tolerances documented for legitimate differences.
Stop condition	Stop when the verifier passes, the budget is exhausted, no progress is made, a blocker appears, or approval is required.
Budget	Set a time, turn, token, retry, file, or dollar cap before running the loop.
Approval boundary	Human approval required before publishing, sending, deleting, spending, changing accounts, touching production, or making reputational/legal/financial commitments.
Safe output	Pull request, patch, report, or evidence log
Works with	Claude Code, OpenAI Codex, Cursor, Gemini CLI, any tool-using coding agent

Runbook

Steps

Identify the most trustworthy reference output: browser engine, legacy implementation, official API, golden files, or production traces.
Build a small command or fixture harness that compares the new implementation against that oracle.
Start with the simplest passing cases, then add one behavior class at a time.
After each change, run the parity harness and record failures with inputs, expected output, actual output, and tolerance rules.
Stop when the scoped fixture set passes or the remaining differences require product or standards judgment.

Copy prompt

Prompt

Run the Reference Oracle Implementation loop. First identify the reference oracle for the behavior: browser engine, legacy system, official library, API response, compiler output, or golden files. Build a repeatable parity harness before implementation. Add one behavior class at a time, compare expected vs actual output, document any tolerance rules, and stop when the scoped fixture set passes or remaining differences require human judgment.

Metadata