Skip to content

Question: What AI Tools Actually Deliver ROI for Small Businesses?

Question: What AI Tools Actually Deliver ROI for Small Businesses?

Status: 🔍 Exploring

The Question

With hundreds of AI tools launching every week, which ones actually save time/money or generate revenue for small businesses? What’s real vs. hype?

Why This Matters

  • Small businesses can’t afford to waste time on tools that don’t work
  • AI marketing is full of hype — need to cut through it
  • Finding genuinely useful tools = competitive advantage
  • This is core to Primores.org’s value proposition

What We Know So Far

From peer-reviewed studies (added May 2026)

Three randomized/quasi-experimental studies now provide direct ROI numbers — not vendor claims, not testimonials:

DomainStudyNHeadline ROI
Customer supportBrynjolfsson, Li & Raymond (2023) NBER WP 311615,179+14% issues/hour avg; +34% for novices; ~0% for experts. Real production setting, staggered rollout.
Writing tasksNoy & Zhang (2023) Science 381(6654)444−40% time, +18% quality. Low-ability workers gained most. Preregistered experiment.
Knowledge workDell’Acqua et al. (2023) HBS WP 24-013758Inside frontier: +12.2% tasks, +25.1% faster, +40% quality. Outside frontier: −19pp accuracy.
Knowledge work (field, 2025)Dillon, Jaffe, Immorlica & Stanton — NBER w337957,137Microsoft Copilot RCT across 66 firms: users saved ~2 hrs/week on email but showed no change in output (same email threads, meetings, documents completed). Time saved did not become more produced.

The unified finding across all three: AI delivers meaningful productivity gains inside its capability frontier, with low-skill workers gaining most. ROI is not uniformly distributed across tasks or workers — it depends on whether the specific task is inside the frontier (see glossary/jagged-frontier) and whether the user was skill-bottlenecked on a phase AI can absorb (see glossary/ai-task-restructuring).

The 2025 reality check (Stanford HAI AI Index 2025): generative-AI use in at least one business function jumped from 33% to 71% of organizations in a single year, yet financial returns stay modest — most organizations report cost savings under 10% and revenue gains under 5%. Adoption raced ahead of measured ROI. Paired with the Copilot RCT above (time saved ≠ output), the lesson is that ROI depends on redeploying the freed time, not just deploying the tool. See glossary/appropriate-reliance — uncritical adoption captures the cost but not the value.

This sharpens the original framing: the question isn’t “does AI deliver ROI?” — it’s “for which task × worker combinations?”

From Experience

  • ChatGPT/Claude for drafting content: High ROI — saves hours per week. Consistent with Noy-Zhang’s 40% time savings on writing tasks.
  • AI image generation for social: Medium ROI — saves money vs. stock photos
  • AI transcription (Otter, etc.): High ROI — saves hours on meeting notes. Inside-the-frontier task with abundant training data.
  • AI “agents” for complex tasks: Low ROI so far — promising but unreliable. Consistent with Dell’Acqua’s outside-frontier finding (−19pp accuracy when capability frontier is exceeded).

Initial Hypotheses

  1. “Boring” productivity AI (writing, transcription) delivers more ROI than “exciting” AI — partially supported: drafting and transcription are inside-frontier; agent orchestration is at or beyond the current frontier.
  2. Tools that augment humans beat tools that try to replace humans — supported: Brynjolfsson’s “centaur” finding (best practices distributed by AI to less-experienced workers) is exactly this pattern.
  3. Free tiers often provide 80% of the value — partially reasoned (see “Free vs. paid” below): not directly addressed by the academic literature, but the frontier/skill-leveling findings let us reason about it.

ROI by business function — the two-axis model (synthesis, May 2026)

The “for which task × worker combinations?” reframing above can be sharpened into a model that predicts ROI by function. Synthesizing the three academic anchors with the wiki’s glossary/jagged-frontier and the severity gate from glossary/guardrails / glossary/customer-perception-moments, ROI is a function of two axes:

  1. Frontier position — is the task inside the AI’s capability frontier (abundant training data, pattern-matchable, verifiable output) or outside it (novel, judgment-heavy, no clear right answer)? (Dell’Acqua 2023: +40% quality inside; −19pp accuracy outside.)
  2. Error cost — what does a confident-but-wrong output cost? Low for a discardable first draft; catastrophic for an unsupervised legal, medical, or financial decision.
Low error costHigh error cost
Inside frontier🟢 Highest ROI — deploy now. Content drafting, transcription, tier-1 support, image generation, summarization-for-humans. (Noy-Zhang −40% time / +18% quality; Brynjolfsson +14%, novices +34%.)🟡 Conditional ROI — ROI realized only with a verification layer. Code generation (review before merge), data analysis (check the numbers), customer-facing copy (human edit). The verification cost is real; net ROI is positive but smaller than the headline.
Outside frontier🟠 Low ROI, low risk — cheap to try, often disappointing. Open-ended ideation, “be creative” tasks. Discard the misses; the cost is just the tokens.🔴 Negative ROI — the −19pp zone. Unsupervised agents on novel multi-step tasks, autonomous decisions in regulated/high-severity domains. The cost of wrong-with-high-confidence output exceeds any labor saved.

The organizing insight: the highest-ROI AI work is not the most impressive-sounding. It clusters in the top-left cell — boring, inside-frontier, low-error-cost tasks where AI augments a human who still owns the outcome. This is the empirical backbone of the wiki’s glossary/automation-eats-execution pattern: AI compresses the execution layer (top-left) first, while strategy and judgment (bottom-right) stay human.

Function-by-function map

FunctionTypical cellWiki evidence
Content / copywriting🟢 / 🟡 (edit before publish)marketing/ai-marketing-case-studies, marketing/ai-human-voice-prompting
Customer support (tier-1)🟢 / 🟡automation/ai-customer-service-cases, cases/intercom-fin-support — and the perception caveat: glossary/customer-perception-moments
Transcription / meeting notes🟢High-ROI, deep inside frontier
Software development🟡 (review-gated)automation/ai-developer-tools-cases, glossary/vibe-coding — inside for boilerplate, jagged at architecture
Data analysis / BI🟡Inside for summarization; outside for causal reasoning
Legal / healthcare / finance🔴 without supervisionautomation/ai-legal-cases, automation/ai-healthcare-cases, automation/ai-finance-banking-cases — high error cost dominates
Autonomous multi-step agents🔴 / 🟠 todayquestions/managed-agents-break-even, glossary/jagged-frontier

Free vs. paid tiers

The academic literature doesn’t test this directly, but the frontier model predicts the answer. Free tiers capture ~80% of the value for tasks that are clearly inside the frontier — drafting, transcription, summarization — because the task is well within reach of even a mid-tier model. The marginal value of a paid tier concentrates at the frontier edge: a stronger model (and higher rate limits, integrations, longer context) is exactly what pulls a borderline task from “outside” to “inside” the frontier. So the paid-tier decision rule: pay when your highest-value tasks sit at the frontier edge or you run high volume; stay free when your tasks are comfortably inside the frontier and intermittent. This refines hypothesis 3 from “untested” to “true for inside-frontier tasks; the exception is frontier-edge work where the better model changes the answer.”

Current Thinking

The highest-ROI AI tools seem to be those that:

  • Automate genuinely tedious tasks (not interesting work)
  • Require minimal setup/learning
  • Integrate with existing workflows
  • Have clear, measurable outputs

Open Threads

Things still to explore:

  • Survey actual small business owners on what they use
  • Test 10 popular AI tools and document real time savings
  • Compare free vs. paid tiers — is paid worth it?
  • Look at AI tools by business function — done via the two-axis model + function map above (May 2026)
  • Find case studies with actual numbers

Sources to Review


Last updated: 2026-05-29 (added the two-axis ROI-by-function model + free-vs-paid reasoning; upgraded 🌱→🌿). Prior: 2026-05-05 academic ingest wave (Brynjolfsson, Noy-Zhang, Dell’Acqua).