Hallucination — When AI Confidently Invents Things

Hallucination

TL;DR: AI hallucination is when a language model generates content that is plausible-sounding but false — fake citations, invented statistics, made-up product features, fabricated court cases. The structural cause is that LLMs predict probable next tokens rather than retrieve true facts. Hallucinations don’t look different from accurate output, which is why human verification is non-optional for any AI-generated content that touches reality. Modern models (2025–2026) hallucinate less than 2023-generation models on common topics but still hallucinate reliably on uncommon entities, recent events, and specific quantitative claims.

Simple explanation

Imagine asking a student who’s read every textbook to answer a question about a specific recent paper they haven’t seen. The student knows the shape of what an answer should look like, so they produce one — confidently, in the right style, with the right vocabulary. The answer just isn’t true.

LLMs work the same way. They predict the most probable next token given everything they’ve seen. When the question lands inside their training distribution (well-documented topics, common facts, established frameworks), prediction is reliable. When it lands outside (specific recent events, uncommon named entities, particular quantitative claims they haven’t been trained on), prediction continues — but now produces plausible text rather than true text. The model has no internal flag for “I don’t actually know this.”

Why it matters for business

Hallucination is the single highest-stakes operational risk in AI-assisted work. Specific failure modes:

Made-up citations and sources. Lawyers have been sanctioned for filing AI-generated briefs containing fabricated case law. The cases looked real (proper names, plausible jurisdictions, realistic citation format) but didn’t exist.
Invented statistics. AI confidently produces specific percentages and dollar figures that have no source. A 2024 study finding that “73% of marketers report X” may be entirely fabricated; the AI generates it because that’s the shape of the sentence the surrounding context calls for.
Wrong product features. AI describing a competitor’s product can invent capabilities the product doesn’t have, leading to wrong analysis and embarrassing client-facing claims.
Confident wrong answers in unfamiliar domains. This is the glossary/jagged-frontier phenomenon at decision time — outside the model’s capability frontier, accuracy drops 19 percentage points (Dell’Acqua 2023, n=758 BCG consultants).

The honest framing: hallucination is the mechanism that makes AI outputs unverifiable without a human in the loop. Every AI-generated factual claim is a hypothesis, not a fact. The hypothesis is right most of the time, but the failure mode is silent — wrong answers look identical to right ones.

Where hallucination is common (and where it isn’t)

Common (verify everything):

Specific named entities (people, companies, papers) the model may not have seen
Recent events past the training cutoff
Specific quantitative claims (percentages, dollar amounts, dates)
Niche or low-resource topics
Citations and references to sources
Detailed factual claims about competitors, products, or markets

Less common (still verify, but lower base rate):

Well-established frameworks and conceptual content
Code in popular languages and libraries
Common knowledge questions
Restructuring or summarizing content the model is given in-context

Rare (low base rate but still possible):

Tasks where the model is given source material and asked to extract or transform it (RAG patterns) — though the model can still hallucinate by misreading or confabulating connections

Why models hallucinate even when they “know better”

This is the surprising part of the modern research. Even when the correct answer is in the training data, models can hallucinate because:

The probabilistic mechanism doesn’t have a “no, I don’t know” mode by default. Producing some answer is what the next-token prediction does; producing no answer requires explicit training (RLHF for refusal behaviors).
Confidence is decoupled from accuracy. The model can be highly confident in a wrong answer because confidence is encoded in the same probability distribution as the answer itself.
Inside-frontier patterns can pull wrong specifics in. The shape of “lawyers cite cases like this” can produce a fabricated case that fits the shape but doesn’t exist. The pattern-match works at the structure level and fails at the specific level.

The 2025–2026 generation of models (Claude Opus 4.7, GPT-5.1, Gemini 2.5 Pro) hallucinate substantially less than 2023-era models on common topics. The Allouah et al. study at Columbia + Yale (referenced in glossary/ai-agent-behavior) documented that Sonnet 3.5 → Opus 4.5 dropped failure rate on “obvious better deal” tests from 63.7% to 4.3% — a similar improvement curve applies to base-rate hallucination on common facts. But hallucination on uncommon entities, recent events, and specific quantitative claims has not improved at the same rate, because those failures aren’t visible to the RLHF training signal.

What reduces hallucination

Retrieval-augmented generation (glossary/rag). Give the model the source material in-context; let it cite from what’s provided rather than from training. This shifts the failure mode from “made up a fact” to “misread a fact” — still possible, but a different and lower-rate failure.
Explicit “I don’t know” permission. Telling the model “if you’re not sure, say so” measurably reduces confident wrong answers, though not to zero.
Citation requirements. “For every factual claim, cite the source.” Forces the model into RAG mode even without external retrieval, and exposes the cases where it can’t.
Tool use. Calling out to search, calculators, databases for factual claims rather than generating them.
Human verification. The non-negotiable step. Every published factual claim AI-generated needs human review. This is operationalized in marketing/ai-tells-in-sales-copy (factual overreach in service of rhythm is one of the 11 patterns) and glossary/honest-assessment (acknowledging uncertainty is a positive trust signal).

Connection to wiki frameworks

glossary/jagged-frontier — Hallucination is the mechanism that makes the outside-frontier territory dangerous. The −19pp accuracy on outside-frontier tasks (Dell’Acqua 2023) is hallucination at scale. Inside the frontier, hallucination is rare; outside, it’s reliable.
glossary/recognition-primed-decision — Klein-Kahneman: pattern-matching is reliable only in high-validity environments with rapid feedback. Hallucination is what happens when the model is not in such an environment but continues to pattern-match anyway.
glossary/honest-assessment — The positive-trust-signal counterpart. Content that acknowledges uncertainty performs better in AI-search citation and human-reader trust; content that confabulates with confidence performs worse over time as readers detect the pattern.
marketing/ai-tells-in-sales-copy — “Factual overreach in service of rhythm” (pattern #9) is the marketing-copy manifestation of hallucination. The fix is the same: verify every factual claim, accept the rhythm hit.
glossary/llm — The underlying technology and why probabilistic prediction produces this failure mode

Honest limits

Hallucination rates are not directly comparable across models. Different benchmarks measure different things; vendor-published rates often use favorable test sets.
“It said something true once” is not evidence the model doesn’t hallucinate on that topic. Stochasticity means the same query may produce true output one time and hallucinated output the next. Verify each instance.
Reducing temperature doesn’t eliminate hallucination. Lower temperature reduces variance but doesn’t change the underlying mechanism — the model is still pattern-matching from training distribution.
RAG reduces but doesn’t eliminate hallucination. The model can still misread, confabulate connections between retrieved chunks, or interpolate. RAG shifts the failure mode rather than eliminating it.

glossary/llm — Why LLMs hallucinate at all (probabilistic prediction)
glossary/jagged-frontier — The asymmetry hallucination produces at the task-pairing layer
glossary/recognition-primed-decision — Klein-Kahneman conditions for when pattern-matching is reliable; predicts where hallucination is rare vs. common
glossary/rag — The most common mitigation pattern
glossary/honest-assessment — The trust-signal counterpart
marketing/ai-tells-in-sales-copy — Factual overreach in copy is hallucination in production
glossary/llm-evals — How hallucination is measured and benchmarked
glossary/agent-engineering — Why production agent work requires verification of outputs the agent itself cannot reliably verify

Key Takeaways

Hallucination is plausible-sounding but false AI output. The structural cause is probabilistic next-token prediction without an internal “I don’t know” mode.
Wrong answers look identical to right ones. This is what makes hallucination dangerous — the failure mode is silent.
Most common in: specific named entities, recent events past training cutoff, specific quantitative claims, niche topics, fabricated citations.
Less common in: well-established frameworks, common-knowledge questions, in-context transformation tasks.
What reduces it: RAG, explicit uncertainty permission, citation requirements, tool use for factual lookups, and the non-negotiable human verification step.
2025–2026 models hallucinate less than 2023 models on common topics, but not uniformly — uncommon entities, recent events, and specific quantitative claims haven’t improved at the same rate.

Sources

Practitioner consensus across glossary/llm-evals sources on hallucination measurement
Multiple 2024–2026 documented legal cases of AI-fabricated citations (lawyers sanctioned, briefs withdrawn) — well-publicized failure mode
Dell’Acqua, F. et al. (2023). Navigating the Jagged Technological Frontier. HBS WP 24-013 — the empirical anchor for outside-frontier accuracy collapse
Allouah, A. et al. (2025). What is your AI Agent Buying? Columbia + Yale Working Paper, December 2025 — model improvement curves on objective-recognition failures
Anthropic, OpenAI, and Google published reduction claims for 2025-generation models — directional rather than precise, as cross-vendor benchmarks vary