Question: What AI Tools Actually Deliver ROI for Small Businesses?

Status: 🔍 Exploring

The Question

With hundreds of AI tools launching every week, which ones actually save time/money or generate revenue for small businesses? What’s real vs. hype?

Why This Matters

Small businesses can’t afford to waste time on tools that don’t work
AI marketing is full of hype — need to cut through it
Finding genuinely useful tools = competitive advantage
This is core to Primores.org’s value proposition

What We Know So Far

From peer-reviewed studies (added May 2026)

Three randomized/quasi-experimental studies now provide direct ROI numbers — not vendor claims, not testimonials:

Domain	Study	N	Headline ROI
Customer support	Brynjolfsson, Li & Raymond (2023) NBER WP 31161	5,179	+14% issues/hour avg; +34% for novices; ~0% for experts. Real production setting, staggered rollout.
Writing tasks	Noy & Zhang (2023) Science 381(6654)	444	−40% time, +18% quality. Low-ability workers gained most. Preregistered experiment.
Knowledge work	Dell’Acqua et al. (2023) HBS WP 24-013	758	Inside frontier: +12.2% tasks, +25.1% faster, +40% quality. Outside frontier: −19pp accuracy.
Knowledge work (field, 2025)	Dillon, Jaffe, Immorlica & Stanton — NBER w33795	7,137	Microsoft Copilot RCT across 66 firms: users saved ~2 hrs/week on email but showed no change in output (same email threads, meetings, documents completed). Time saved did not become more produced.

The unified finding across all three: AI delivers meaningful productivity gains inside its capability frontier, with low-skill workers gaining most. ROI is not uniformly distributed across tasks or workers — it depends on whether the specific task is inside the frontier (see glossary/jagged-frontier) and whether the user was skill-bottlenecked on a phase AI can absorb (see glossary/ai-task-restructuring).

The 2025 reality check (Stanford HAI AI Index 2025): generative-AI use in at least one business function jumped from 33% to 71% of organizations in a single year, yet financial returns stay modest — most organizations report cost savings under 10% and revenue gains under 5%. Adoption raced ahead of measured ROI. Paired with the Copilot RCT above (time saved ≠ output), the lesson is that ROI depends on redeploying the freed time, not just deploying the tool. See glossary/appropriate-reliance — uncritical adoption captures the cost but not the value.

This sharpens the original framing: the question isn’t “does AI deliver ROI?” — it’s “for which task × worker combinations?”

From Experience

ChatGPT/Claude for drafting content: High ROI — saves hours per week. Consistent with Noy-Zhang’s 40% time savings on writing tasks.
AI image generation for social: Medium ROI — saves money vs. stock photos
AI transcription (Otter, etc.): High ROI — saves hours on meeting notes. Inside-the-frontier task with abundant training data.
AI “agents” for complex tasks: Low ROI so far — promising but unreliable. Consistent with Dell’Acqua’s outside-frontier finding (−19pp accuracy when capability frontier is exceeded).

Initial Hypotheses

“Boring” productivity AI (writing, transcription) delivers more ROI than “exciting” AI — partially supported: drafting and transcription are inside-frontier; agent orchestration is at or beyond the current frontier.
Tools that augment humans beat tools that try to replace humans — supported: Brynjolfsson’s “centaur” finding (best practices distributed by AI to less-experienced workers) is exactly this pattern.
Free tiers often provide 80% of the value — partially reasoned (see “Free vs. paid” below): not directly addressed by the academic literature, but the frontier/skill-leveling findings let us reason about it.

ROI by business function — the two-axis model (synthesis, May 2026)

The “for which task × worker combinations?” reframing above can be sharpened into a model that predicts ROI by function. Synthesizing the three academic anchors with the wiki’s glossary/jagged-frontier and the severity gate from glossary/guardrails / glossary/customer-perception-moments, ROI is a function of two axes:

Frontier position — is the task inside the AI’s capability frontier (abundant training data, pattern-matchable, verifiable output) or outside it (novel, judgment-heavy, no clear right answer)? (Dell’Acqua 2023: +40% quality inside; −19pp accuracy outside.)
Error cost — what does a confident-but-wrong output cost? Low for a discardable first draft; catastrophic for an unsupervised legal, medical, or financial decision.

	Low error cost	High error cost
Inside frontier	🟢 Highest ROI — deploy now. Content drafting, transcription, tier-1 support, image generation, summarization-for-humans. (Noy-Zhang −40% time / +18% quality; Brynjolfsson +14%, novices +34%.)	🟡 Conditional ROI — ROI realized only with a verification layer. Code generation (review before merge), data analysis (check the numbers), customer-facing copy (human edit). The verification cost is real; net ROI is positive but smaller than the headline.
Outside frontier	🟠 Low ROI, low risk — cheap to try, often disappointing. Open-ended ideation, “be creative” tasks. Discard the misses; the cost is just the tokens.	🔴 Negative ROI — the −19pp zone. Unsupervised agents on novel multi-step tasks, autonomous decisions in regulated/high-severity domains. The cost of wrong-with-high-confidence output exceeds any labor saved.

The organizing insight: the highest-ROI AI work is not the most impressive-sounding. It clusters in the top-left cell — boring, inside-frontier, low-error-cost tasks where AI augments a human who still owns the outcome. This is the empirical backbone of the wiki’s glossary/automation-eats-execution pattern: AI compresses the execution layer (top-left) first, while strategy and judgment (bottom-right) stay human.

Function-by-function map

Function	Typical cell	Wiki evidence
Content / copywriting	🟢 / 🟡 (edit before publish)	marketing/ai-marketing-case-studies, marketing/ai-human-voice-prompting
Customer support (tier-1)	🟢 / 🟡	automation/ai-customer-service-cases, cases/intercom-fin-support — and the perception caveat: glossary/customer-perception-moments
Transcription / meeting notes	🟢	High-ROI, deep inside frontier
Software development	🟡 (review-gated)	automation/ai-developer-tools-cases, glossary/vibe-coding — inside for boilerplate, jagged at architecture
Data analysis / BI	🟡	Inside for summarization; outside for causal reasoning
Legal / healthcare / finance	🔴 without supervision	automation/ai-legal-cases, automation/ai-healthcare-cases, automation/ai-finance-banking-cases — high error cost dominates
Autonomous multi-step agents	🔴 / 🟠 today	questions/managed-agents-break-even, glossary/jagged-frontier

Free vs. paid tiers

The academic literature doesn’t test this directly, but the frontier model predicts the answer. Free tiers capture ~80% of the value for tasks that are clearly inside the frontier — drafting, transcription, summarization — because the task is well within reach of even a mid-tier model. The marginal value of a paid tier concentrates at the frontier edge: a stronger model (and higher rate limits, integrations, longer context) is exactly what pulls a borderline task from “outside” to “inside” the frontier. So the paid-tier decision rule: pay when your highest-value tasks sit at the frontier edge or you run high volume; stay free when your tasks are comfortably inside the frontier and intermittent. This refines hypothesis 3 from “untested” to “true for inside-frontier tasks; the exception is frontier-edge work where the better model changes the answer.”

Current Thinking

The highest-ROI AI tools seem to be those that:

Automate genuinely tedious tasks (not interesting work)
Require minimal setup/learning
Integrate with existing workflows
Have clear, measurable outputs

Open Threads

Things still to explore:

Survey actual small business owners on what they use
Test 10 popular AI tools and document real time savings
Compare free vs. paid tiers — is paid worth it?
Look at AI tools by business function — done via the two-axis model + function map above (May 2026)
Find case studies with actual numbers

Sources to Review

Academic/research studies on AI productivity — three core anchors ingested (Brynjolfsson 2023, Noy-Zhang 2023, Dell’Acqua 2023). See glossary/ai-skill-leveling, glossary/ai-task-restructuring, glossary/jagged-frontier.
AI tool comparison sites (need to evaluate their credibility)
Small business forums/Reddit for real user experiences
Vendor case studies (with skepticism)

automation/finding-ai-use-cases — TRIPS framework for prioritizing AI opportunities
automation/ai-enablement-levels — Understanding the 5 levels of AI adoption
glossary/llm-evals — How to evaluate if AI is actually working
questions/ai-as-personal-advisor — Related exploration on AI for productivity
glossary/jagged-frontier — AI is asymmetric: helps inside frontier, hurts outside (Dell’Acqua 2023, n=758)
glossary/ai-skill-leveling — Three studies confirming AI raises low-performer productivity disproportionately
glossary/ai-task-restructuring — Where the time savings come from (drafting compresses; framing and editing remain)
comparisons/strategy-vs-execution-ai — Synthesis: what kinds of work get the ROI
glossary/automation-eats-execution — The cross-domain pattern these findings anchor
glossary/guardrails — The error-cost axis operationalized: pair high-error-cost AI use with a verification gate
glossary/customer-perception-moments — Where the customer-support ROI cell intersects perception (forgiveness, recovery copy) — ROI isn’t just resolution rate

Last updated: 2026-05-29 (added the two-axis ROI-by-function model + free-vs-paid reasoning; upgraded 🌱→🌿). Prior: 2026-05-05 academic ingest wave (Brynjolfsson, Noy-Zhang, Dell’Acqua).

Question: What AI Tools Actually Deliver ROI for Small Businesses?

Question: What AI Tools Actually Deliver ROI for Small Businesses?

The Question

Why This Matters

What We Know So Far

From peer-reviewed studies (added May 2026)

From Experience

Initial Hypotheses

ROI by business function — the two-axis model (synthesis, May 2026)

Function-by-function map

Free vs. paid tiers

Current Thinking

Open Threads

Sources to Review

Related