Jagged Frontier — Why AI Helps on Some Tasks and Hurts on Others

Jagged Frontier

TL;DR: The “jagged technological frontier” (Dell’Acqua et al. 2023, BCG × Harvard study with n=758 consultants) names the empirical finding that AI capabilities form an irregular border, not a smooth one. On tasks inside the frontier, GPT-4 made consultants 12.2% more productive, 25.1% faster, and produced higher quality on 40%+ of cases. On a single task selected to be just outside the frontier, AI users were 19 percentage points less likely to produce correct answers than non-AI users. The frontier is not visible from the task description — even experienced knowledge workers cannot reliably tell which side of it a task sits on.

What it is

The jagged-frontier concept comes from a 2023 randomized field experiment with 758 BCG management consultants — about 7% of the firm’s individual-contributor consultants worldwide — using GPT-4. The headline result is the asymmetry: AI is not uniformly helpful or harmful. It is sharply helpful on tasks inside its capability frontier and sharply harmful on tasks just outside.

The metaphor: imagine a coastline. From a satellite view, AI capability looks like an expanding territory. Up close, the coastline is jagged — peninsulas of capability extend deep into some tasks while inlets of incapability cut through tasks that look superficially similar. Two tasks that appear equally hard to a human can sit on opposite sides of the frontier.

This is the central finding the wiki uses the term to anchor: the question isn’t “does AI help knowledge work” — it’s “which specific tasks fall inside the frontier today, and how do you tell?”

The Study That Coined It

Element	Detail
Sample	758 BCG consultants, ~7% of individual contributors firm-wide
Tool	GPT-4 (mid-2023 version)
Design	Three randomized arms: no AI / AI access / AI access + training
Inside-frontier tasks	18 realistic creative + analytical knowledge tasks designed by social scientists with BCG
Outside-frontier task	1 task: complex business problem requiring quantitative analysis combined with information that was misleading or contradicted subjective context

Inside-frontier results

+12.2% more tasks completed
+25.1% faster on average
>40% of AI-using consultants produced higher-quality work than the control group
The bottom-half of consultants by skill saw the largest gains; the top-bottom skill gap compressed (skill-leveling effect — see glossary/ai-skill-leveling)

Outside-frontier results

−19 percentage points likelihood of producing correct solutions vs. control

The AI confidently produced plausible-sounding wrong answers; consultants over-relied on them.

The Frontier Is Invisible From The Outside

The most operationally important part of the finding is that knowledge workers cannot reliably tell, from a task description, whether the task sits inside or outside the frontier. The 18 inside-frontier tasks and the 1 outside-frontier task were drawn from the same general space of consulting work. The frontier separates them, but the separation isn’t legible from the surface.

This has direct implications for how AI is deployed in the wild:

A team can adopt AI on a portfolio of tasks, see clear average gains, and not realize that one subset of tasks is being silently degraded — because the frontier doesn’t announce itself.
Confidence calibration fails: AI tends to produce equally confident-sounding answers regardless of whether the task is inside or outside the frontier. Users often cannot distinguish.
Iterative trial-and-error — try AI on a task class, measure whether output is reliable, decide whether to keep using it — is the only operationally robust way to map the frontier locally.

Two Effective Integration Patterns

The paper identifies two human-AI workflow patterns that both work:

Centaurs — strict division of labor. The human partitions the work and assigns each piece to whoever (human or AI) handles it better. Boundaries are clean.
Cyborgs — deep integration. AI is woven into nearly every step. The human steers, edits, prompts iteratively, and shapes outputs.

Both can succeed. The failure mode is neither: using AI as a one-shot oracle for tasks that turn out to be outside the frontier.

The centaur/cyborg framing is useful for automation/ai-implementation-patterns — it gives a vocabulary for thinking about workflow design beyond “use AI” or “don’t.”

Why it matters for the wiki

The jagged-frontier concept is load-bearing for several wiki frames that the wiki had previously asserted from softer evidence:

comparisons/strategy-vs-execution-ai — the paper supplies hard evidence that AI is asymmetric in effect. Not “helps a little everywhere” but “helps a lot on some tasks, hurts on others.” This sharpens the strategy-vs-execution claim from “AI handles execution” to “AI handles execution inside its frontier, and using it outside the frontier is actively value-destroying.”
glossary/automation-eats-execution — the framework now has a peer-reviewed empirical foundation, not only domain-specific industry data.
questions/managed-agents-break-even — agents are not uniformly trustworthy across task types. The frontier is the frame for thinking about reliability boundaries.
automation/finding-ai-use-cases — the TRIPS framework gains a sharper test: “is this task inside or outside the frontier?” beats vague “is this AI-fit?”

What “inside the frontier” looks like — practical signals

Drawing from the Dell’Acqua paper plus corroborating evidence from glossary/ai-skill-leveling and glossary/ai-task-restructuring, tasks tend to be inside the frontier when:

The task has clear input → output structure (write a memo from these notes, classify these tickets, summarize this document).
Many similar examples exist in training data.
Verification is fast — a human can check the output in less time than it took to produce.
The task does not require integrating context the AI cannot see (organizational politics, unstated client preferences, information not in the prompt).
Failure modes are visible: a wrong answer looks wrong, not plausible.

Tasks tend to be outside the frontier when:

The task requires reconciling conflicting information or recognizing when context is misleading (the Dell’Acqua outside-frontier task explicitly tested this).
The right answer requires knowledge or judgment the AI doesn’t have access to.
Failure modes are invisible: a wrong answer looks identical to a right one.
The task is novel enough that there’s no nearby pattern in training data.

Karpathy’s “jagged intelligence” — the model-side cousin

Andrej Karpathy used the term “jagged intelligence” at Sequoia AI Ascent 2026 to describe the model-side version of the same structural phenomenon Dell’Acqua documented on the human-pairing side. Models can refactor 100,000 lines of code, find zero-day vulnerabilities, and pass professional engineering exams — and simultaneously fail to count letters in “strawberry” or invent plausible-looking functions that don’t exist. Karpathy’s mechanism account: labs train models through reinforcement learning where rewards follow verifiable results (compiles, tests, proofs). Verifiable capabilities improve fast; “common sense” can’t be auto-verified, so it stagnates.

The two terms describe the same insight from different sides:

	What’s jagged	Who feels it
Jagged frontier (Dell’Acqua 2023, n=758 BCG consultants)	Which tasks a human-AI pair handles well vs. badly	The operator — has to decide whether to delegate this specific task
Jagged intelligence (Karpathy 2026, Sequoia AI Ascent)	Which capabilities a model has at all	The engineer building on the model — has to know where the cliffs are

For the wiki’s glossary/agent-engineering cluster: jagged intelligence is the engineering constraint that makes agent engineering a distinct discipline from vibe coding. The engineering work is the verification the model can’t do for itself.

For the glossary/recognition-primed-decision foundation (Klein-Kahneman): both jagged shapes are predicted by the same conditions for pattern-matching reliability — high-validity environments with rapid feedback produce smooth capability; their absence produces jaggedness. The two terms are different surface phenomena from one underlying constraint on when pattern-matching works.

Honest limits

Single firm (BCG), single profession (management consulting), single tool (GPT-4 ~mid-2023). External validity to other knowledge work requires assumption — though the qualitative pattern (jaggedness) replicates in glossary/ai-skill-leveling and glossary/ai-task-restructuring.
The specific frontier location moves with model capability. Newer models will pull the frontier outward — but jaggedness as a property is more durable than any specific frontier line.
Researcher-designed tasks, not naturally-occurring work. Real consulting work may be messier; the inside/outside boundaries may be even harder to spot.
Single experimental session. Long-horizon effects (does AI use atrophy human skill over months?) not measured.
The “+40% higher quality on inside-frontier tasks” finding rests on blind grading by experienced graders. As with all quality-of-knowledge-work measurements, grading rubrics encode assumptions that may not match real-world value.

glossary/automation-eats-execution — synthesis page; the jagged-frontier finding is one of three empirical anchors
glossary/ai-skill-leveling — the bottom-half-of-skill consultants gained most; this finding is shared across Dell’Acqua, Brynjolfsson, and Noy-Zhang
glossary/ai-task-restructuring — Noy-Zhang complement to Dell’Acqua: where humans add value when AI takes drafting
comparisons/strategy-vs-execution-ai — strategy-vs-execution synthesis; jagged-frontier is the empirical mechanism
questions/managed-agents-break-even — agent reliability is task-dependent in exactly the way jagged-frontier predicts
automation/ai-implementation-patterns — centaur/cyborg patterns connect to here
glossary/recognition-primed-decision — Klein-Kahneman frame for why the frontier exists where it does (high- vs low-validity environments)
glossary/agent-engineering — Karpathy’s “jagged intelligence” is the model-side cousin of this finding; the agent-engineering discipline exists because of jaggedness
glossary/vibe-coding — the floor-raising complement; vibe coding works inside the frontier and breaks down outside it
glossary/ai-agent-behavior — agent purchasing decisions inherit the same asymmetry at the decision layer
glossary/agent-adoption-frictions — the user-side response to jaggedness; users can’t see the frontier either, and they calibrate delegation accordingly. The “perceived competence” friction is partly a calibration response to this finding
glossary/hallucination — the mechanism behind the −19pp accuracy collapse on outside-frontier tasks. Hallucination at scale is what makes the frontier jagged rather than smooth

Key takeaways

AI capability is jagged, not smooth — two seemingly similar tasks can sit on opposite sides of the frontier.
Inside the frontier: GPT-4 made BCG consultants 12.2% more productive, 25.1% faster, +40% higher quality on 40%+ of cases.
Outside the frontier: AI users were 19 percentage points less likely to produce correct answers — AI actively misleads when over-extended.
The frontier is invisible from a task description. Workers and managers cannot reliably tell which side a task is on without testing.
Two effective integration patterns: centaurs (clean division of labor) and cyborgs (deep integration). Failure mode is neither — using AI as a one-shot oracle.
Skill-leveling effect: bottom-half consultants gained most. The skill premium compressed inside the frontier.
Karpathy’s “jagged intelligence” (Sequoia AI Ascent 2026) is the model-side cousin: same structural phenomenon, viewed from the capability side rather than the human-pairing side.

Sources

Dell’Acqua, F., McFowland III, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper 24-013. Republished in Organization Science (2025). — The foundational paper. n=758 BCG consultants, GPT-4, randomized 3-arm field experiment.
Mollick, E. (2023). Discovering AI’s jagged frontier — and what we’ve learned since. The author’s accessible commentary on the paper and follow-up work.
Kahneman & Klein (2009) provides the theoretical complement: the conditions under which expert pattern-matching is reliable vs. not. See glossary/recognition-primed-decision.
Karpathy, A. (May 2026). From Vibe Coding to Agentic Engineering. Sequoia AI Ascent 2026 talk. YouTube. The talk introduced “jagged intelligence” as the model-side framing of the same phenomenon — same structural insight, different vantage point.