The Staged-Compiler Pattern: Chaining AI Skills from Strategy to Production
The Staged-Compiler Pattern
TL;DR: Structure AI-heavy knowledge work as compilers: numbered stages with machine-readable JSON contracts between them, human approval gates at the points of irreversibility, and self-QA as an explicit stage. For strategy→production work, use two compilers joined by one frozen contract — a research compiler (expensive, durable, compounding) and a production compiler (cheap, disposable, regenerable) — because the two layers have different cadence, craft, and economics. Scale comes from parallel agent fan-out off one central briefing file: diversity by design, not by luck.
The pattern in one picture
inputs ──▶ [ RESEARCH COMPILER ]──▶ units.json ──▶ [ PRODUCTION COMPILER ] ──▶ brief packages stages 0..6 (frozen stages 0..5 gates: scope, contract, gates: concept confirmation, stable IDs) approval, final QA sign-off ▲ │ └────────── field results ◀────────────────┘ (IDs map results back)The worked instance is a paid-social pipeline — marketing/evidence-graded-audience-research as the research compiler, marketing/prescriptive-production-briefs as the production compiler — but nothing in the pattern is marketing-specific. It generalizes to any work that splits into “expensive analysis that compounds” and “cheap synthesis that regenerates.”
Why two compilers instead of one pipeline
The layers differ on three axes, and merging them forces the wrong economics on both:
| Research compiler | Production compiler | |
|---|---|---|
| Cadence | Weeks–months; revised by field evidence | Hours–days; regenerated per wave |
| Craft | Evidence discipline, judgment, calibration | Prescriptive serialization, consistency |
| Economics | Expensive, durable, compounding asset | Cheap, disposable compilation |
The operating rule that falls out: research is durable, scenarios regenerate. If a brief is wrong, recompile it from the unit. If a unit is wrong, that’s a research event — it goes back through the research compiler’s feedback stage, not patched downstream. (The same asymmetry as source code vs build artifacts — hence “compiler.”)
The contract is what makes the split safe. The research compiler’s output is a frozen, machine-readable file (units.json) with stable IDs; the production compiler refuses input that isn’t signed off. Post-sign-off changes append new IDs — never renumber — so every downstream artifact keeps resolving.
Staged-compiler anatomy
Five properties define the pattern:
- Numbered stages with typed artifacts. Each stage emits a named file (digest, unit table, load manifest, brief). Stages are resumable and auditable; you can re-run stage 4 without re-paying for stages 0–3.
- JSON contracts at the boundaries. Structured artifacts between stages, prose only inside them. This is the workflows-not-agents end of Anthropic’s spectrum (“Building Effective Agents,” Dec 2024): predefined code paths with checks at intermediate steps, chosen deliberately over letting a model dynamically direct the whole process — for work where predictability beats flexibility.
- Human gates at the points of irreversibility. Not everywhere — only where a wrong call poisons everything downstream: research scope (before expensive fetching), audience confirmation (before everything built on it), final sign-off (before IDs freeze). Between gates, the machine runs at machine speed.
- Self-QA as a stage, not a virtue. An explicit checklist gate (evidence-or-hypothesis grading upstream; a prescriptiveness/compliance checklist downstream; similarity audit in batch mode). The QA artifact is the substance of the human gate — the human reviews a verdict, not raw output.
- Feedback as a first-class input. Field results re-enter the research compiler and re-rank the queue. The mechanism is mechanical, not aspirational: output names encode unit IDs, so a results export maps back to units automatically, with failure attribution at the right level (execution vs angle vs audience).
Scaling: parallel fan-out with central allocation
Batch mode (the worked instance ran 20 units → 120 creatives → 360 variants) adds the scale moves:
- Parallel workers off one central briefing file. Twenty brief-writing agents, each consuming the same run-level briefing, brand spec, and scene library — the orchestrator-worker shape from Anthropic’s multi-agent research system write-up (June 2025), where parallel subagents with their own context windows beat a single agent decisively on wide tasks.
- Central concept allocation = diversity by design, not by luck. Concepts are assigned to units before fan-out, so twenty independent workers can’t converge on the same idea. (They still converge at the copy level — the strongest stat got picked as an opener by four independent workers — which is why the similarity audit stage exists. Allocation prevents concept collisions; only an audit catches copy collisions.)
- Scene library for reuse by design. Shared universal scenes mean parallel outputs aggregate into a deduped shotlist instead of 120 bespoke footage requests.
- Gate adaptation in batch mode. Per-unit gates would mean 20 approval interruptions; they collapse into central allocation up front plus one batch-level overview gate at the end. The gate count adapts; the gate placement principle (points of irreversibility) doesn’t.
Honest framing
None of this is glamorous AI. The pattern’s pitch is “do current work 5× faster with the judgment kept human at the gates” — it’s glossary/automation-eats-execution built into an architecture: stages compress the execution, gates keep the strategy human. Two calibration notes:
- The “5×” is an engagement-level observation, not a benchmark. What’s solid: a 20-unit / 360-variant batch with one operator and gates intact, in days not weeks.
- A boundary the worked instance had to learn: the research compiler delivers a prioritized testing queue; test design and budget allocation belong to the client/buyer. Compilers compile — deciding what the field experiment is remains a human call outside the pipeline.
When to use it (and not)
Use the pattern when the work has a natural strategy/production split, outputs are many and structured, errors downstream are cheap but errors upstream are poisonous, and you’ll run it more than once (the contract pays for itself on the second run).
Skip it when the task is one-off and small (a single creative, a single analysis — just do the work), or genuinely open-ended exploration where stages can’t be named in advance — that’s agent territory, not workflow territory, per the Anthropic distinction.
Key Takeaways
- Split AI work pipelines where the economics split: expensive-durable-compounding research vs cheap-disposable-regenerable production, joined by a frozen machine-readable contract with stable IDs.
- Gates go at points of irreversibility, not everywhere; between gates the machine runs at machine speed. In batch mode, per-item gates collapse into central allocation + one overview gate.
- Self-QA is a stage that emits an artifact — the human gate reviews a verdict, not raw output.
- Parallel fan-out needs central allocation for diversity by design — and still needs a similarity audit, because workers collide at the copy level even when concepts are allocated.
- Encode IDs in output names so field results merge back mechanically; feedback is an input, not a retrospective.
Related
- marketing/evidence-graded-audience-research — the research compiler (the worked upstream instance)
- marketing/prescriptive-production-briefs — the production compiler (the worked downstream instance)
- automation/multi-agent-patterns — the orchestration patterns this builds on (dispatcher + workers)
- glossary/skill — each compiler ships as a reusable skill; the contract is what lets skills chain
- glossary/agent-engineering — the discipline this pattern belongs to: coordinating AI work reliably
- glossary/automation-eats-execution — the framework the pattern operationalizes (stages compress execution, gates keep strategy human)
- glossary/context-engineering — the central briefing file + scene library are context engineering for parallel workers
- automation/ai-enablement-levels — where staged compilers sit on the prompting→anticipatory spectrum
- tools/target-audience-research — the research compiler as a shipped skill (tool review)
- tools/scenario-compiler — the production compiler as a shipped skill (tool review)
- tools/ai-email-production-stack — the pattern in email: durable design system + regenerable per-campaign assets
- marketing/email-design-system — the email channel’s durable layer, in depth (the human-authored anchor campaigns regenerate against)
Sources
- Anthropic — Building Effective Agents (Dec 2024) — workflows vs agents; gates on intermediate steps; the design vocabulary this pattern uses
- Anthropic — How we built our multi-agent research system (June 2025) — orchestrator-worker fan-out; parallel subagents beat single-agent on wide tasks (at ~15× token cost)
- Engagement artifacts, June 2026 (internal; client anonymized) — the two-skill pipeline, 20-unit batch run, gate-adaptation and similarity-audit learnings