AI Skill Leveling — Why Novices Gain Most From AI Tools

AI Skill Leveling

TL;DR: Three independent randomized/quasi-experimental studies — Brynjolfsson, Li & Raymond (2023, n=5,179 customer-support agents), Noy & Zhang (2023, n=444 college-educated professionals on writing tasks), and Dell’Acqua et al. (2023, n=758 BCG consultants) — converge on the same finding: AI raises novice/low-performer productivity substantially more than expert productivity, compressing the skill distribution. Brynjolfsson found +34% for novices, ~0% for top performers. Noy-Zhang found low-ability writers gained most; quality inequality narrowed. Dell’Acqua found bottom-half-skill consultants gained most. Three methods, three settings, same pattern. The implication: AI is a performance leveler in functions where it works at all.

What it is

The name “AI skill leveling” is shorthand for a robust empirical regularity: when generative AI tools are introduced into knowledge-work settings, the productivity benefits accrue disproportionately to the least skilled workers, not equally across the workforce. The skill distribution compresses. The premium that experts could once charge for outcomes a novice couldn’t produce gets squeezed.

This is not a theoretical claim. It is the same headline finding in three otherwise-different studies published in 2023 — different industries, different tasks, different methodologies — yet pointing the same direction with consistent magnitude.

The three datasets

Study	N	Setting	Tool	Skill leveling finding
Brynjolfsson, Li & Raymond (2023)	5,179	Customer-support agents at a Fortune 500 SaaS firm	Conversational AI assistant suggesting responses	+14% avg, +34% novices, ~0% experts
*Noy & Zhang (2023, Science)*	444	College-educated professionals on incentivized writing tasks	ChatGPT (GPT-3.5)	Time −40%, quality +18%; low-ability workers gained most, productivity inequality narrowed
Dell’Acqua et al. (2023, BCG × Harvard)	758	BCG management consultants	GPT-4 on 18 inside-frontier knowledge tasks	+12.2% tasks completed, +25.1% faster, +40% higher quality; bottom-half skill consultants gained most, top-bottom gap compressed

Three different mechanisms (response-suggestion, draft-generation, capability-frontier work), three different cohorts (support agents, writers, consultants), one consistent skill-leveling pattern.

The mechanism (Brynjolfsson’s interpretation)

Brynjolfsson, Li & Raymond’s analysis points to a clean explanation for the customer-support case:

The AI tool functions by disseminating the best practices of high-performing workers. Top performers had already internalized effective response patterns through years of experience. The AI extracted those patterns and surfaced them to others.

If the AI is in effect a fast distillation of expert practice, then:

Novices, who lacked the expert pattern library, gain access to it instantly. Productivity rises sharply.
Experts, who already had the pattern library internalized, have nothing new to learn from a tool reflecting their own practices. Productivity barely moves.

This interpretation explains why the leveling finding replicates across studies: any tool trained on human-produced text ends up encoding average-to-best practice. Workers below that level gain; workers at or above gain little.

A second mechanism (Noy-Zhang’s task-restructuring complement)

Noy-Zhang point to a related but distinct mechanism: AI substitutes for effort, not skill.

The bottleneck for low-ability writers is often the rough-drafting phase — getting something down. AI removes that bottleneck entirely.
High-ability writers were already past the rough-drafting bottleneck — for them, the bottleneck was idea generation and editing, both of which AI helps less with.

So the leveling effect is also partly the effect of which sub-tasks the AI absorbs. See glossary/ai-task-restructuring for the full task-shift finding.

The two mechanisms are complementary

Brynjolfsson: AI levels because it distributes expert-level patterns to non-experts.
Noy-Zhang: AI levels because it absorbs the sub-tasks where novices lag most.

Both are likely operating simultaneously in different proportions across different domains. The two mechanisms predict the same leveling outcome, which is why the empirical finding replicates so reliably.

Why it matters for the wiki

The skill-leveling finding has direct, concrete implications:

For hiring and staffing

The legacy assumption — “more experienced staff produce more reliable output” — partially breaks under AI. If a novice with AI matches a mid-level performer’s productivity in a domain, the marginal value of mid-level hires falls. Senior strategists (who AI doesn’t replace) become more valuable; mid-tier execution staff become harder to justify.

This is the labor-economics half of glossary/automation-eats-execution: the bifurcation isn’t only “execution vs. strategy” — it’s also “novice + AI vs. senior strategist,” with the middle of the org chart caught between.

For agency pricing

Agencies pricing on execution volume face a structural problem: clients can produce execution output themselves with AI at near-novice quality. Pricing pegged to strategic outcomes (campaign performance, attributed revenue) faces upward pressure; pricing pegged to execution deliverables (per-asset, per-hour) faces downward pressure. See comparisons/strategy-vs-execution-ai for the agency-pricing synthesis.

For training investment

Conventional career development assumed: junior → mid → senior, with each level adding incrementally. The skill-leveling finding suggests a different curve: AI-augmented junior performance leapfrogs much of mid-level work directly. The investments that matter are at the strategic end (judgment, taste, integration) — what AI doesn’t level.

For Modash 2026 and the influencer-marketing data

The Modash 2026 +$14,830 strategy premium documented in marketing/influencer-marketing-task-overload is exactly what the skill-leveling finding predicts at scale: the execution layer compresses (because AI levels novice and mid-level execution), while the strategy layer (which AI doesn’t level) commands a growing premium.

Honest limits

All three studies are 2023-vintage research with then-current models (GPT-3.5/4). Headline numbers will shift with capability; the pattern (leveling) appears more durable than any specific magnitude.
Brynjolfsson is single firm, single function (customer support). Most external-validity-friendly because of large N and real production setting; least transferable to creative/strategic knowledge work.
Noy-Zhang is one-shot experimental writing tasks; doesn’t capture multi-day, multi-stakeholder work or long-term skill effects.
Dell’Acqua is single firm (BCG), single profession (management consulting), researcher-designed tasks.
Why don’t experts gain? The interpretation that “AI = best-practice diffusion” is consistent with the data but not directly causally identified. An alternative: experts may already be at a measurement ceiling on issues-per-hour or words-per-minute, and the AI tool is hitting a ceiling effect rather than a substantive skill ceiling.
The leveling finding does not say AI is uniformly helpful — see glossary/jagged-frontier for the asymmetric counterpart. Skill leveling applies inside the frontier, not outside.
The leveling finding does not predict long-term skill development. If novices rely on AI to produce expert-pattern output, the open question is whether they ever develop the pattern library themselves, or remain dependent on the tool. Not measured in any of the three studies.
Skill-leveling assumes the AI tool is well-aligned with the task. Misaligned tools (wrong domain, wrong output style) can hurt all skill levels, not level them.

glossary/jagged-frontier — asymmetric counterpart: leveling happens inside the frontier; outside it AI can hurt performance for everyone
glossary/ai-task-restructuring — Noy-Zhang’s task-shift mechanism complements the skill-leveling effect
glossary/automation-eats-execution — synthesis page; skill-leveling is one of the labor-economics mechanisms
comparisons/strategy-vs-execution-ai — strategy-vs-execution synthesis with skill-leveling implications for hiring, pricing, training
marketing/influencer-marketing-task-overload — Modash 2026 +$14,830 strategy premium is the field-data analog of skill-leveling at scale
cases/binti-social-services — Field-data analog in regulated documentation work: 50% time reduction on social-worker case notes. The novice-to-expert distance on documentation work is exactly what skill-leveling compresses; the relationship work (where senior judgment matters) stays human-leveraged.
questions/what-ai-tools-actually-deliver-roi — direct empirical answer for ROI inside the frontier (14% / 40% time / 12% productivity)
automation/finding-ai-use-cases — TRIPS framework gains a lens: tasks where novices currently lag have the largest AI uplift
glossary/agent-engineering — The opposite-end discipline: agent-engineering returns leverage to operator expertise (skilled operators compound across many agent threads), where AI-without-discipline compresses the skill premium

Key takeaways

Three independent studies (n=5,179 + 444 + 758) converge: AI raises low-performer productivity substantially more than expert productivity. The skill premium compresses.
Mechanism 1 (Brynjolfsson): AI disseminates expert patterns to non-experts. Novices gain access to a pattern library they hadn’t built; experts gain little from a tool reflecting their own practices.
Mechanism 2 (Noy-Zhang): AI substitutes for effort on the rough-drafting bottleneck. Novices lag more on this phase; experts had already passed it.
The two mechanisms are complementary; both predict skill leveling.
Direct implications for hiring (senior strategists more valuable, mid-tier execution staff under pressure), agency pricing (shift from execution-volume to strategic outcomes), and training (invest at the strategic end where AI doesn’t level).
The leveling finding holds inside the AI capability frontier. Outside it (see glossary/jagged-frontier) AI can hurt all skill levels.
Long-term skill-development effects of novice AI reliance are not yet measured.

Sources

Brynjolfsson, E., Li, D., & Raymond, L. (2023). Generative AI at Work. NBER Working Paper No. 31161. — n=5,179 customer-support agents, staggered rollout. Headline: +14% issues/hour, +34% novices, ~0% experts. The cleanest field-evidence anchor.
Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187–192. — n=444 college-educated professionals, preregistered online experiment. Time −40%, quality +18%, low-ability gained most.
Dell’Acqua, F. et al. (2023). Navigating the Jagged Technological Frontier. HBS Working Paper 24-013. — n=758 BCG consultants. Bottom-half skill consultants gained most on the 18 inside-frontier tasks.
Modash (2026). State of Influencer Marketing Salaries 2026. n=499. Field-data analog: strategy premium +$14,830 in a function with mature AI execution tooling.