AI Humor and Forgiveness — Self-Deprecating Humor as a Service-Failure Recovery Tactic
AI Humor and Forgiveness
TL;DR: When an AI agent makes a mistake, humorous responses make users more forgiving — and self-deprecating humor outperforms positive humor by a wide margin. Xie et al. 2025 (Journal of Business Research, n=1,919, 4 experiments) found self-deprecating humor produces +47.8% forgiveness uplift vs no humor for wrong-recommendation errors, and +25.6% for after-sales failures. Positive humor works too (+33.9% / +15.7%) but less. Two boundary conditions are load-bearing. First, the severity gate: the effect is strong for low-severity mistakes (wrong color recommendation) and disappears for high-severity failures (refusing a rightful refund). Second, the focal-customer gate that the 2025 Don’t Humor Me! counter-finding (Honora et al., J. Business Ethics) surfaces: humor that works for observers can inverts when delivered to the customer directly burned by the failure — they read it as sarcasm, perceive reduced company morality, and forgive less. Independent corroboration in the 2025 Nature Scientific Reports study (n=780, 3 studies) adds a consumer-motivation moderator: hedonic-motivation consumers (entertainment-driven) respond to humor more strongly than functional-motivation consumers, mediated by perceived warmth. 2026 practitioner gate: low severity + non-focal-burn + AFTER the failure has been resolved + hedonic context = humor helps. Otherwise: stay sincere. Self-deprecation is the safer humor type when in doubt.
Why this matters
The wiki’s glossary/agent-adoption-frictions page captures Wharton’s framing: AI agent adoption is blocked by psychology, not technology. One of the loudest psychological barriers is the AI-error perception asymmetry — users systematically misjudge AI partners by focusing on where the system fails on easy cases, with those failures looming larger psychologically than the harder cases where the AI adds real value.
Hosanagar, in the Wharton Human-AI Research context, names the asymmetry directly:
“People are not as forgiving of AI errors as they are of human errors. They systematically misjudge AI partners by focusing on where the system fails on ‘easy’ cases. Those failures loom larger psychologically than the harder cases where the AI adds real value.”
The practical consequence: even a 98% accurate AI is judged harshly for the 2% of mistakes, because the mistakes are weighted heavier than the wins. A junior human employee with the same error rate would be considered excellent. The forgiveness gap is the load-bearing adoption barrier most teams miss.
Humor — specifically the right kind of humor at the right severity level for the right audience — is one of the few empirically-supported tactics that partially closes this gap.
The primary study — Xie et al. 2025
Citation: Xie, Y., Zhou, P., Liang, C., Zhao, S., & Lu, W. (2025). Affiliative or self-deprecating? Exploring the effect of humor types on customer forgiveness in the context of AI agents’ service failure. Journal of Business Research, vol. 194. School of Management, Hefei University of Technology.
Method: 4 experiments with 1,919 total participants. Manipulated the AI chatbot’s response to a service failure across three conditions: no humor, positive (affiliative) humor, self-deprecating humor. Measured forgiveness as the dependent variable across two service contexts (product recommendation, after-sales service) and two severity levels (low, high).
Findings (the four effect sizes the practitioner case rests on):
| Context | Humor type | Forgiveness uplift vs. no humor |
|---|---|---|
| Wrong product recommendation | Positive humor | +33.9% |
| Wrong product recommendation | Self-deprecating humor | +47.8% |
| After-sales failure | Positive humor | +15.7% |
| After-sales failure | Self-deprecating humor | +25.6% |
The severity gate (the boundary condition the four numbers above hide):
The forgiveness uplifts above apply to low-severity failures. The effect:
- Stays strong for low-severity mistakes (wrong color recommendation for a jumper; minor service delay)
- Disappears entirely for high-severity failures (refusing a refund the customer was entitled to; sending the wrong order; missing a critical deadline)
The severity gate is not a soft moderator. The effect doesn’t weaken — it vanishes. A self-deprecating quip after refusing a customer’s legitimate refund doesn’t produce +25.6% forgiveness; it produces zero or negative effect.
Examples of the two humor types tested:
- Positive (affiliative) humor: “AI needs coffee too, let me try again.” Friendly, lighthearted, mood-lifting. The humor target is the situation, not the AI itself.
- Self-deprecating humor: “I apologize — turns out I’m a bit more ‘artificial’ than ‘intelligent’ today.” The humor target is the AI’s own capability. The AI is laughing at itself.
The self-deprecating version outperforms by ~13pp in the recommendation context and ~10pp in the after-sales context — a roughly 1.4× advantage at consistent direction across both contexts.
Why self-deprecation specifically
The mechanism the authors propose, supported by adjacent literature:
- Humor regulates emotion (Yam et al. 2014) — making fun of a difficult situation reduces felt frustration in the moment.
- In service contexts, humor reduces anger toward the failed party (Söderlund 2021) — the same pattern carries over to AI service failures.
- Self-deprecation specifically signals humility and self-awareness. The AI saying “I messed up” is heard as a competence-relevant admission, not a mood-lift attempt. This connects to the glossary/honest-assessment mechanism: stating real limitations builds trust faster than hiding them. Self-deprecating humor is honest-assessment with a charm coat.
- Positive humor signals a deflection. “AI needs coffee!” lands as mood-management; it doesn’t address the failure. Self-deprecation does.
The third mechanism is the load-bearing one. A user who reads “I’m a bit more artificial than intelligent today” hears two things simultaneously: an admission that the AI made a mistake (the honest-assessment trust signal) and a low-stakes framing of that admission (the humor emotion-regulation effect). The combination is stronger than either alone.
Independent corroboration — Nature Scientific Reports 2025
A second 2025 study published in Nature Scientific Reports (n=780, 3 studies) examined humorous response strategies in AI service robots specifically.
The corroborating finding: consumers driven by hedonic motivations (seeking entertainment, enjoyment, or experience) exhibit significantly stronger forgiveness toward humorous AI responses, mediated by enhanced perceived warmth. Consumers driven by functional motivations show weaker effects.
The practitioner translation:
- Hedonic contexts where humor works best: entertainment platforms, social commerce, lifestyle e-commerce, gaming, casual content discovery
- Functional contexts where humor’s effect is weaker: banking, healthcare, B2B SaaS, mission-critical workflows, regulatory and compliance
This nuance refines the Xie et al. findings: even when the severity is low, humor’s payoff is highest in hedonic contexts. A wrong color recommendation on a fashion site (hedonic) responds to humor better than a wrong cell-format suggestion in a spreadsheet AI (functional).
Plus an additional 2024 paper (Bharadia & colleagues, American Journal of Management Science and Engineering) finds a user-side moderating effect: users who score higher on dispositional humor receptivity respond more strongly to humorous AI service-failure responses. Users low in humor receptivity respond weakly or negatively.
The direction is consistent across three independent research efforts: humor works for AI service failure recovery, with moderators (severity, motivation type, user disposition) that are all in the expected direction.
The counter-finding — Honora, Japutra, Septianto 2025
The most important nuance for practitioner deployment comes from a counter-finding worth taking seriously.
Citation: Honora, A., Japutra, A., & Septianto, F. (2025). Don’t Humor Me! Customers’ Moral Perceptions Toward Companies’ Humorous Responses in Social Media Service Recovery. Journal of Business Ethics. October 2025.
The pivot the paper introduces: prior research on humor in service recovery (including the Xie et al. line of work) primarily measured reactions from observers — people watching the service interaction unfold but not directly burned by the failure. Honora et al. measured reactions from focal customers — the people who actually experienced the service failure.
The finding flips: for focal customers, humorous recovery responses reduce perceived company morality, are interpreted as sarcastic, heighten negative affect, and reduce forgiveness. The same response that delights observers feels like mockery to the customer who got burned.
The mechanism the authors propose: focal customers are in a state of emotional load and unresolved harm. A humorous response from the company is read as “they’re not taking my problem seriously” — exactly the wrong signal at exactly the wrong moment. Observers don’t carry that emotional load, so they read the humor as charming.
The boundary conditions Honora et al. surface for safer humor deployment:
- Use humor only when the failure is less relevant to the company’s core business. A pizza delivery service can joke about a misplaced pepperoni; an airline cannot joke about a missed connection. Core-business failures activate strong moral perceptions; peripheral failures don’t.
- Timing matters. Humor is more morally acceptable after the failure has been resolved, not during the apology-and-recovery phase. The sequence: address the harm sincerely → resolve the problem → then deploy humor (e.g., in a follow-up touchpoint).
- The audience determines acceptability. Public-facing social-media replies that observers will see can use humor more freely; direct one-to-one responses to the affected customer should default to sincere.
Reconciling the two findings: the Xie et al. lab experiments were structured such that participants experienced the failure themselves (in a recommendation context) — yet they still showed positive humor effects. The likely reconciliation: failure severity, emotional load, and core-business relevance all interact. In low-severity, low-emotional-load, peripheral-relevance contexts (Xie et al.’s wrong-jumper-color scenario), the focal-victim penalty is small. In high-severity, high-emotional-load, core-business contexts (Honora et al.’s service-recovery scenarios), the focal-victim penalty dominates.
The practitioner gate that emerges from integrating both findings: humor helps when low-severity + peripheral-relevance + post-resolution + non-emotionally-loaded. Otherwise: stay sincere.
The 2026 practitioner playbook
Synthesizing the three studies + the Hosanagar Wharton framing into operational guidance:
When to deploy humor in AI service responses
| Condition | Humor helps? |
|---|---|
| Low-severity failure (wrong recommendation, minor delay, format issue) | ✅ Yes |
| High-severity failure (refund refusal, wrong order, missed deadline, data loss) | ❌ No — effect vanishes; risks negative |
| Peripheral to core business (extra feature, secondary suggestion) | ✅ Yes |
| Core-business failure (the thing the company exists to do) | ❌ No — moral perception penalty |
| Hedonic context (entertainment, fashion, lifestyle, social) | ✅ Yes — strongest effect |
| Functional context (banking, healthcare, compliance) | ⚠️ Cautious — weaker effect |
| After resolution (failure already addressed) | ✅ Yes |
| During the apology phase (failure not yet resolved) | ❌ No — risks reading as dismissive |
| Observer audience (public reply, comment thread) | ✅ Yes — strongest effect |
| Focal customer 1:1 (direct support DM) | ⚠️ Cautious — sincerity preferred |
Humor type selection
When the gates above are passed, default to self-deprecating humor. The Xie et al. data shows it outperforms positive humor by ~10–13pp consistently across contexts. Self-deprecation is also the safer choice when the gate decisions are uncertain — it carries the honest-assessment signal even if the humor itself doesn’t fully land.
Copy examples (composable patterns)
Self-deprecating (default when gates pass):
- “Oops — turns out I’m a bit more ‘artificial’ than ‘intelligent’ today. Let me try that again.”
- “My non-brain just glitched on that one — give me one more shot?”
- “I’m powered by algorithms, but I’m definitely on dial-up speed today. Sorry about that.”
- “That recommendation was less helpful than I intended. My pattern-matching got carried away — here’s a better one.”
Positive (acceptable when self-deprecation feels too on-the-nose):
- “Even the best AI needs a second look sometimes — let me reroute.”
- “Caught me being too literal. Let me think about that differently.”
Avoid (the failure modes the literature flags):
- “Oops!” with no acknowledgment of the actual mistake — too casual; reads as dismissive
- Joking responses to high-severity failures of any kind
- Humor that targets the customer (“you might have phrased that strangely”) — never works
- Humor that breaks the brand voice for laughs — costs more in coherence than it gains in forgiveness
Adjacent service-recovery tactics in the same cluster
The humor-on-failure pattern doesn’t operate in isolation. Two adjacent service-recovery language patterns are supported by independent peer-reviewed evidence and combine well with the humor playbook:
“Thank you” instead of “sorry.” A 2022 study in Journal of Travel & Tourism Marketing on AI device service-failure recovery found that chatbots expressing gratitude (vs apology) are more likely to gain consumers’ forgiveness when the failure involves rejection (the system couldn’t help, denied a request, or routed away) rather than being ignored. The CXPA finding that 68% of customers report decreased trust after repetitive formulaic apologies reinforces the broader pattern: AI chatbots trained on customer-service corpora over-apologize structurally (Allen Institute for AI found chatbots use “I’m sorry” 3.7× more often in neutral queries than scientific-domain chatbots), and the over-apology pattern actively erodes trust. The substitution: rather than “I’m sorry, I couldn’t find what you were looking for” → “Thanks for bearing with me — let me try a different approach.” Gratitude implies the customer’s patience contributed; apology implies the AI failed unilaterally.
Interjections (“Oh no!” / “Ah!”) in failure responses. The broader chatbot-communication-style literature supports the use of natural-language interjections as engagement signals — they signal personification + empathy markers that increase perceived warmth without crossing into the forced-friendly tone trap. “Oh no — that’s not what I meant. Let me retry” outperforms “That answer was incorrect. Let me retry.” Use sparingly; one interjection per recovery response is enough — multiple interjections cross into the inauthentic-warmth pattern that backfires per the agent-adoption-frictions Pratfall-Effect finding.
These three patterns (self-deprecating humor + thanks-not-sorry + sparse interjections) form an integrated service-recovery voice that can be codified in a Voice-Profile Document (technique 1 of marketing/ai-human-voice-prompting). The combined effect compounds: the response acknowledges the failure (self-deprecation), credits the customer’s patience (gratitude), and signals natural engagement (interjection) — all in one response. Stack them; the gates from the humor section above (severity, focal-customer, timing) apply to the combined pattern.
The severity-classification step
The biggest implementation challenge isn’t writing the humor — it’s classifying the failure as low-severity vs. high-severity at runtime. The 2026 production pattern:
- Tag each interaction class as low- or high-severity at design time (recommendation = low; refund refusal = high; order error = high; format suggestion = low)
- Route the AI response template based on the tag — humor-permitted for low-severity, sincere-only for high-severity
- When uncertain, default to sincere. False positives on “this is high-severity” cost nothing; false negatives cost meaningfully more
This is consistent with the broader glossary/guardrails discipline: pair every powerful pattern (humor as a forgiveness lever) with a corresponding gate (severity classification) that prevents misapplication.
Connection to wiki frameworks
- glossary/agent-adoption-frictions — Wharton’s three-friction framework. The forgiveness gap (Hosanagar perception asymmetry) is the cleanest evidence yet that trust friction has an asymmetric repair cost — AI mistakes are weighted heavier than human mistakes, so the repair tactics must be stronger. Humor is one of the few empirically-supported repair tactics.
- glossary/honest-assessment — Self-deprecating humor is honest-assessment with a charm coat. Same underlying mechanism (admitting real limits builds trust); different surface presentation (humor reduces emotional load).
- glossary/hallucination — Hallucinations are the failure modes humor-on-failure responses are designed to handle. When an AI hallucinates a wrong product recommendation, the right user-facing response is “my pattern-matching got carried away — here’s a better one” (self-deprecating, low-severity-tagged) — not “sorry” (apologetic without competence signal).
- glossary/guardrails — The severity-classification gate is a guardrail pattern. Pair every powerful response style with the gate that determines when it’s appropriate.
- marketing/ai-tells-in-sales-copy — Humor used carelessly is a positive AI tell (forced-friendly tone) that catalogued the wiki’s 11-pattern audit. Humor used per the gates above is not a tell — it’s a coherent UX move tied to documented psychology.
- marketing/ai-human-voice-prompting — The six techniques for AI human voice. Self-deprecating humor on failure is a candidate seventh technique, scoped to the service-failure context. Different from the existing techniques (which target generation of voice in social posts and outreach) — this targets recovery in conversational interactions.
- automation/ai-customer-service-cases — The case-studies layer where this pattern lives operationally. Production deployments of humor-on-failure should track win-rate metrics (forgiveness proxies: CSAT after resolution, retention 90 days post-failure, return-customer rates) against sincere-only baselines.
Honest limits
Six caveats the wiki should preserve:
- Single study at the primary citation. Xie et al. is one paper from one research team at one university. Independent replications exist (Nature Sci Reports 2025; AJMSE 2024) and the direction holds, but effect sizes haven’t been re-measured precisely in independent labs at scale.
- Online chatbot context only. All cited research used online AI chatbots in controlled conditions. The effect may differ in voice agents, video avatars, or embodied robots. The Nature Sci Reports 2025 paper covered “service robots” but in the chatbot-equivalent text-mediated context, not physical-robot deployment.
- The focal-vs-observer split (Honora et al.) is the most actionable nuance and the least replicated. Treat it as a hypothesis worth taking seriously rather than a settled finding. Most teams should default to sincere for the affected customer until they’ve tested humor in their specific context.
- Humor styles tested were limited to affiliative and self-deprecating. Self-enhancing humor (“I’m too smart for this”) and aggressive humor (“the system did it, not me”) were not tested. Don’t generalize the positive findings to all humor — the two tested types both lean humble; aggressive or arrogant humor is likely to backfire.
- Cultural specificity is unknown. All three primary studies used English-speaking participants in Western (US, UK, Australia) or East Asian (China) contexts. Humor sensitivities vary culturally; the same response that lands as charming in one market may land as flippant in another. Localize per market.
- The mechanism is correlational at the population level, not causal at the individual level. A self-deprecating humorous response will not reliably make any individual user more forgiving. It shifts the average forgiveness response. Individual variance is wide; design for the average but expect outliers in both directions.
Related
- glossary/agent-adoption-frictions — The three-friction framework this finding extends; the forgiveness asymmetry is the trust-friction repair cost
- glossary/review-response-strategy — the public-review-response complement: match tone to the justice type violated
- glossary/honest-assessment — Self-deprecation as honest-assessment-with-a-charm-coat; same trust mechanism
- glossary/hallucination — The failure mode the humor-on-failure pattern is designed to recover from
- glossary/guardrails — The severity-classification gate that pairs with the humor pattern
- marketing/ai-tells-in-sales-copy — Where humor goes wrong (forced-friendly tone as positive AI tell); this page is the where-it-goes-right counterpart
- marketing/ai-human-voice-prompting — Generation-side voice techniques; humor-on-failure is the recovery-side counterpart
- automation/ai-customer-service-cases — The case-studies cluster where production deployments live
- cases/intercom-fin-support — High-resolution-rate customer-service AI; the metrics layer where humor-on-failure deployments should be tested
- glossary/ai-agent-behavior — Agent-side biases (what agents choose); this page is about user-side responses to agent failures (a different layer of the same trust system)
- glossary/weekend-review-effect — Adjacent behavioral-evidence research on customer perception. Both pages are about how content-style choices affect customer reactions at moments of judgment
- glossary/customer-perception-moments — The framework hub: this page is the failure-recovery moment anchor in the three-moments customer-perception cluster
Sources
Primary research:
- Xie, Y., Zhou, P., Liang, C., Zhao, S., & Lu, W. (2025). Affiliative or self-deprecating? Exploring the effect of humor types on customer forgiveness in the context of AI agents’ service failure. Journal of Business Research, vol. 194. DOI: 10.1016/j.jbusres.2025.115381. Hefei University of Technology. n=1,919 across 4 experiments.
Independent corroboration:
- The influence of AI service robots’ humorous response strategies on consumer forgiveness following service failure (2025). Nature Scientific Reports. n=780 across 3 studies. Article. Adds the hedonic-vs-functional moderator and the perceived-warmth mechanism.
- Research on the Influence of Humor Response in the Context of Artificial Intelligence Service Failure: Moderating Effect Based on User Humor (2024). American Journal of Management Science and Engineering. Article. Adds the user-disposition moderator (humor receptivity).
Counter-finding (the focal-customer gate):
- Honora, A., Japutra, A., & Septianto, F. (2025). Don’t Humor Me! Customers’ Moral Perceptions Toward Companies’ Humorous Responses in Social Media Service Recovery. Journal of Business Ethics, October 2025. DOI: 10.1007/s10551-025-06154-y. Surfaces the focal-customer-vs-observer distinction; identifies sarcasm-misreading as the mechanism; recommends timing (post-resolution) and relevance (peripheral-only) gates.
The Wharton framing:
- Hosanagar, K. Wharton Human-AI Research. The AI-error perception asymmetry quote sourced from the Science Says newsletter coverage of the Wharton Blueprint for AI Agent Adoption (Spring 2026). Wharton Blueprint landing page.
- See glossary/agent-adoption-frictions for the full Wharton Blueprint framing.
Mechanism literature:
- Yam, K. C., et al. (2014). Humor as emotion regulation. JSTP. DOI: 10.1108/JSTP-09-2014-0187.
- Söderlund, M. (2021). Service-context humor and anger reduction. Journal of Business Research. DOI: 10.1016/j.jbusres.2021.04.034.
Adjacent service-recovery tactics:
- “Thank you” not “sorry”: Apology or gratitude? The effect of communication recovery strategies for service failures of AI devices (Journal of Travel & Tourism Marketing, 2022, vol. 39, no. 6). Article. Gratitude beats apology specifically for rejection-type failures. CXPA finding: 68% of customers report decreased trust after repetitive formulaic apologies. Allen Institute for AI finding: chatbots use “I’m sorry” 3.7× more often than scientific-domain chatbots.
- Interjections in chatbot responses: Broader chatbot-communication-style literature supports interjection use as personification + empathy markers. See also Enhancing customer satisfaction with chatbots: communication styles and consumer attachment (Frontiers in Psychology, 2022). Use sparingly to avoid the inauthentic-warmth backfire.
Ingest provenance:
- Humor makes it easier to forgive AI mistakes — Science Says newsletter (Thomas McKinlay), May 26, 2026 issue. The newsletter summarized Xie et al. 2025 and provided the original-research signal; this wiki page extends the framing with the Honora et al. counter-finding and the Nature Sci Reports corroboration. Email archived to
raw/articles/_ingested_2026-05-26_humor-makes-it-easier-to-forgive-AI-mistakes.eml.