The Staged-Compiler Pattern: Chaining AI Skills from Strategy to Production

The Staged-Compiler Pattern

TL;DR: Structure AI-heavy knowledge work as compilers: numbered stages with machine-readable JSON contracts between them, human approval gates at the points of irreversibility, and self-QA as an explicit stage. For strategy→production work, use two compilers joined by one frozen contract — a research compiler (expensive, durable, compounding) and a production compiler (cheap, disposable, regenerable) — because the two layers have different cadence, craft, and economics. Scale comes from parallel agent fan-out off one central briefing file: diversity by design, not by luck.

The pattern in one picture

inputs ──▶ [ RESEARCH COMPILER ]──▶ units.json ──▶ [ PRODUCTION COMPILER ] ──▶ brief packages
            stages 0..6              (frozen          stages 0..5
            gates: scope,             contract,        gates: concept
            confirmation,             stable IDs)      approval, final QA
            sign-off
                  ▲                                          │
                  └────────── field results ◀────────────────┘
                          (IDs map results back)

The worked instance is a paid-social pipeline — marketing/evidence-graded-audience-research as the research compiler, marketing/prescriptive-production-briefs as the production compiler — but nothing in the pattern is marketing-specific. It generalizes to any work that splits into “expensive analysis that compounds” and “cheap synthesis that regenerates.”

Why two compilers instead of one pipeline

The layers differ on three axes, and merging them forces the wrong economics on both:

	Research compiler	Production compiler
Cadence	Weeks–months; revised by field evidence	Hours–days; regenerated per wave
Craft	Evidence discipline, judgment, calibration	Prescriptive serialization, consistency
Economics	Expensive, durable, compounding asset	Cheap, disposable compilation

The operating rule that falls out: research is durable, scenarios regenerate. If a brief is wrong, recompile it from the unit. If a unit is wrong, that’s a research event — it goes back through the research compiler’s feedback stage, not patched downstream. (The same asymmetry as source code vs build artifacts — hence “compiler.”)

The contract is what makes the split safe. The research compiler’s output is a frozen, machine-readable file (units.json) with stable IDs; the production compiler refuses input that isn’t signed off. Post-sign-off changes append new IDs — never renumber — so every downstream artifact keeps resolving.

Staged-compiler anatomy

Five properties define the pattern:

Numbered stages with typed artifacts. Each stage emits a named file (digest, unit table, load manifest, brief). Stages are resumable and auditable; you can re-run stage 4 without re-paying for stages 0–3.
JSON contracts at the boundaries. Structured artifacts between stages, prose only inside them. This is the workflows-not-agents end of Anthropic’s spectrum (“Building Effective Agents,” Dec 2024): predefined code paths with checks at intermediate steps, chosen deliberately over letting a model dynamically direct the whole process — for work where predictability beats flexibility.
Human gates at the points of irreversibility. Not everywhere — only where a wrong call poisons everything downstream: research scope (before expensive fetching), audience confirmation (before everything built on it), final sign-off (before IDs freeze). Between gates, the machine runs at machine speed.
Self-QA as a stage, not a virtue. An explicit checklist gate (evidence-or-hypothesis grading upstream; a prescriptiveness/compliance checklist downstream; similarity audit in batch mode). The QA artifact is the substance of the human gate — the human reviews a verdict, not raw output.
Feedback as a first-class input. Field results re-enter the research compiler and re-rank the queue. The mechanism is mechanical, not aspirational: output names encode unit IDs, so a results export maps back to units automatically, with failure attribution at the right level (execution vs angle vs audience).

Auto-generated claims need a compliance gate

There’s a gate that doesn’t exist when the production compiler stops at a brief but becomes mandatory the moment its terminal stage auto-generates finished copy: a human fact-check and platform-policy pass before any spend.

The reason it’s easy to skip is exactly why it’s dangerous. When a stage emits a brief, a human still writes the words — claims get vetted in the writing. When the stage emits finished copy, the machine confidently produces checkable claims and policy-sensitive framing (statistics, efficacy or outcome language, sensitive-category framing) that look done and trustworthy — so the natural instinct is to ship them. Auto-generation removes the authoring step that used to carry the check; the check has to be re-added explicitly as a gate, or it silently disappears.

Make it explicit in the contract, not tribal knowledge: carry a per-asset compliance field in the production-stage manifest (e.g. none / policy-watch / non-specific) so every checkable or policy-sensitive asset is flagged for the gate by construction. The human then reviews a tagged list, not raw output — the same “review a verdict, not the raw thing” discipline as the self-QA stage (#4).

This sharpens rather than contradicts the “creative decision is the moat” thesis in marketing/prescriptive-production-briefs: auto-generation doesn’t remove the human creative decision, it relocates it — from writing the copy to selecting compliance gates, vetting claims, and framing the copy A/B. The judgment stays human; it just moves to a different stage.

Scaling: parallel fan-out with central allocation

Batch mode (the worked instance ran 20 units → 120 creatives → 360 variants) adds the scale moves:

Parallel workers off one central briefing file. Twenty brief-writing agents, each consuming the same run-level briefing, brand spec, and scene library — the orchestrator-worker shape from Anthropic’s multi-agent research system write-up (June 2025), where parallel subagents with their own context windows beat a single agent decisively on wide tasks.
Central concept allocation = diversity by design, not by luck. Concepts are assigned to units before fan-out, so twenty independent workers can’t converge on the same idea. (They still converge at the copy level — the strongest stat got picked as an opener by four independent workers — which is why the similarity audit stage exists. Allocation prevents concept collisions; only an audit catches copy collisions.)
Scene library for reuse by design. Shared universal scenes mean parallel outputs aggregate into a deduped shotlist instead of 120 bespoke footage requests.
Gate adaptation in batch mode. Per-unit gates would mean 20 approval interruptions; they collapse into central allocation up front plus one batch-level overview gate at the end. The gate count adapts; the gate placement principle (points of irreversibility) doesn’t.

Honest framing

None of this is glamorous AI. The pattern’s pitch is “do current work 5× faster with the judgment kept human at the gates” — it’s glossary/automation-eats-execution built into an architecture: stages compress the execution, gates keep the strategy human. Two calibration notes:

The “5×” is an engagement-level observation, not a benchmark. What’s solid: a 20-unit / 360-variant batch with one operator and gates intact, in days not weeks.
A boundary the worked instance had to learn: the research compiler delivers a prioritized testing queue; test design and budget allocation belong to the client/buyer. Compilers compile — deciding what the field experiment is remains a human call outside the pipeline.

When to use it (and not)

Use the pattern when the work has a natural strategy/production split, outputs are many and structured, errors downstream are cheap but errors upstream are poisonous, and you’ll run it more than once (the contract pays for itself on the second run).

Skip it when the task is one-off and small (a single creative, a single analysis — just do the work), or genuinely open-ended exploration where stages can’t be named in advance — that’s agent territory, not workflow territory, per the Anthropic distinction.

Key Takeaways

Split AI work pipelines where the economics split: expensive-durable-compounding research vs cheap-disposable-regenerable production, joined by a frozen machine-readable contract with stable IDs.
Gates go at points of irreversibility, not everywhere; between gates the machine runs at machine speed. In batch mode, per-item gates collapse into central allocation + one overview gate.
Self-QA is a stage that emits an artifact — the human gate reviews a verdict, not raw output.
Parallel fan-out needs central allocation for diversity by design — and still needs a similarity audit, because workers collide at the copy level even when concepts are allocated.
Encode IDs in output names so field results merge back mechanically; feedback is an input, not a retrospective.

marketing/evidence-graded-audience-research — the research compiler (the worked upstream instance)
marketing/prescriptive-production-briefs — the production compiler (the worked downstream instance)
marketing/andromeda-era-creative-strategy — the durable-vs-regenerable split applied to paid static: template library (durable) feeding individual ads (regenerable)
automation/multi-agent-patterns — the orchestration patterns this builds on (dispatcher + workers)
glossary/skill — each compiler ships as a reusable skill; the contract is what lets skills chain
glossary/agent-engineering — the discipline this pattern belongs to: coordinating AI work reliably
glossary/automation-eats-execution — the framework the pattern operationalizes (stages compress execution, gates keep strategy human)
glossary/context-engineering — the central briefing file + scene library are context engineering for parallel workers
automation/ai-enablement-levels — where staged compilers sit on the prompting→anticipatory spectrum
tools/target-audience-research — the research compiler as a shipped skill (tool review)
tools/scenario-compiler — the production compiler as a shipped skill (tool review)
tools/ai-email-production-stack — the pattern in email: durable design system + regenerable per-campaign assets
marketing/meta-ad-policy — the policy reference the auto-generated-claims compliance gate checks against before spend
marketing/email-design-system — the email channel’s durable layer, in depth (the human-authored anchor campaigns regenerate against)
automation/knowledge-management — the compounding-wiki pattern is this architecture applied to a knowledge base

Sources

Anthropic — Building Effective Agents (Dec 2024) — workflows vs agents; gates on intermediate steps; the design vocabulary this pattern uses
Anthropic — How we built our multi-agent research system (June 2025) — orchestrator-worker fan-out; parallel subagents beat single-agent on wide tasks (at ~15× token cost)
Engagement artifacts, June 2026 (internal; client anonymized) — the two-skill pipeline, 20-unit batch run, gate-adaptation and similarity-audit learnings