Appropriate Reliance — Trusting AI the Right Amount, Not the Most

Appropriate Reliance

TL;DR: The right amount of trust in AI is calibrated, not maximal. Both over-reliance (accepting wrong AI output) and under-reliance (ignoring correct AI output) degrade decisions — and Schemmer et al. (JAIR 2025) prove formally that maximizing AI adherence is suboptimal. The behavioral evidence pulls in tension: merely labeling advice “AI” causes people to over-rely on it even against their own interest (Klingbeil et al. 2024, Computers in Human Behavior), and higher confidence in an AI tool predicts less critical thinking (Lee et al. 2025, CHI, n=319) — yet a study of 529 chess players found the opposite, no over-reliance and a preference for self-reliance, with more expert users trusting AI less (Journal of Decision Systems 2025). The reconciliation: reliance is moderated by expertise and stakes. Novices on low-stakes, in-frontier tasks over-rely; experts on high-stakes tasks under-rely. The design goal is neither blind trust nor blanket skepticism — it’s appropriate reliance, engineered. And there’s a twist for marketers: disclosing AI use erodes trust via reduced legitimacy (Schilke & Reimann 2025, 13 experiments) — so calibration has a communication dimension, not just a decision one.

What it means

Appropriate reliance (also “calibrated reliance”) is the state where a person accepts AI output exactly when it’s correct and overrides it exactly when it’s wrong. It sits between two failure modes:

Over-reliance — accepting AI output that’s wrong (automation bias). The dominant concern in 2026.
Under-reliance — rejecting AI output that’s right (algorithm aversion). The quieter, equally costly failure.

The key reframing from the 2024–2025 research: more reliance is not better. The target is a calibration curve, not a maximization. This is the operational complement to the glossary/jagged-frontier (AI is right inside its frontier, wrong outside) — appropriate reliance is the human response to a jagged tool: trust it where it’s strong, override it where it’s weak, and build the judgment to tell the difference.

Why it matters

Most AI-deployment advice optimizes the wrong variable — “drive adoption,” “increase trust,” “get people to use it.” The research says that’s a category error. Schemmer et al. (JAIR 2025) show that an interface maximizing AI adherence produces worse decisions than one supporting calibrated reliance. For any business putting AI in front of staff or customers, the design goal is the right amount of trust, task by task — which means building in the friction, verification, and judgment that calibration requires, not stripping it away in the name of “seamless” AI.

The evidence — and the tension

Over-reliance is real, and it’s triggered by a label

Klingbeil, Grützner & Schreck (2024, Computers in Human Behavior 160:108352) ran an incentivized six-round trust game and found that the mere knowledge that advice was AI-generated caused people to over-rely on it — following AI advice that conflicted with available context and ran against their own financial interest. Worse, the over-reliance imposed costs on third parties, degrading human cooperation. The label alone — not the advice quality — moved behavior.

Confidence in the tool suppresses critical thinking

Lee et al. (2025, CHI, Microsoft Research + Carnegie Mellon) surveyed 319 knowledge workers across 936 real task examples and found a dual confidence effect: higher confidence in the GenAI tool is associated with less critical thinking, while higher confidence in one’s own ability is associated with more. AI tools “reduce the perceived effort of critical thinking while encouraging over-reliance.” (Caveat: correlational and self-reported — directional, not causal.)

But experts under-rely — the counterpoint

The tension: a study of 529 chess players (Journal of Decision Systems 2025, SEM) found no over-reliance at all — participants preferred self-reliance even though AI support genuinely improved their performance (5.98 → 8.05 of 12 puzzles correct). And domain expertise was positively related to self-confidence and negatively related to trust in AI: more expert users took a more critical stance. This is the mirror image of Klingbeil and Lee.

The resolution: expertise × stakes moderation

The findings aren’t contradictory — they describe different people on different tasks:

	Low stakes / in-frontier task	High stakes / out-of-frontier task
Novice	🔴 Over-relies (Klingbeil, Lee) — accepts the label, offloads thinking	🟠 Over-relies dangerously — can’t catch the error
Expert	🟡 Mild reliance — low cost either way	🟢 Under-relies / self-relies (chess study) — critical stance, catches errors

The unifying principle Schemmer et al. formalize: the goal is appropriate reliance, and it’s a moving target set by who is using AI on what. Over-reliance dominates among novices on routine tasks; under-reliance appears among experts on high-stakes tasks. Designing for “more trust” helps one group and harms the other.

The disclosure paradox (the marketer’s version)

There’s a second-order finding that turns appropriate reliance into a communication problem, not just a decision one. Schilke & Reimann (2025, Organizational Behavior and Human Decision Processes, 13 experiments, 5,000+ participants) found that actors who disclose using AI are trusted significantly less — a 16% drop in a professor-grading scenario, ~18% for AI-disclosing ads, ~20% on a designer-rehire decision — and the mechanism is reduced perceived legitimacy (typicality, commitment, authenticity), not simple algorithm aversion.

The paradox: hiding AI use is worse if you’re caught (third-party exposure damages trust more than voluntary disclosure), but disclosing it costs you trust up front. This connects directly to the wiki’s glossary/honest-assessment spine — the long-run answer is honesty, but the short-run trust tax is real and must be managed (lead with the human judgment and outcome, not the tool). Expect the effect to attenuate as AI use normalizes.

The practitioner playbook — engineering appropriate reliance

Stop optimizing for adoption; optimize for calibration. The metric isn’t ”% who used the AI” — it’s ”% who accepted it when right and overrode it when wrong.”
Add friction where stakes are high. A verification step (the glossary/guardrails pattern) protects against novice over-reliance exactly where it’s most costly — the bottom-right of the jagged frontier.
Pair AI with the judgment layer, don’t replace it. Lee et al.’s dual-confidence finding implies: build user self-confidence (training, transparency about AI limits), not just tool confidence. People who trust their own judgment think more critically about AI output.
Surface uncertainty. AI delivers wrong answers as confidently as right ones. Anything that signals “the model is unsure here” pushes users toward appropriate override.
Manage the disclosure tax. Disclose AI use (honesty wins long-run and being caught is worse) but frame around human ownership of the outcome — the legitimacy signal Schilke & Reimann found people are missing.

Honest limits

Lee et al. is correlational and self-reported — it shows association, not that AI causes reduced critical thinking. Don’t overstate.
The chess no-over-reliance result is one domain (a high-validity, clear-feedback game) and uses SEM on a single sample; the “experts under-rely” reading is robust but domain-bounded.
Klingbeil’s setting is a lab economic game — high internal validity, limited external validity to real workflows.
The disclosure effect may be transient — as AI use becomes the norm, the legitimacy penalty for disclosure should shrink. Treat the magnitudes as 2025-vintage.
“Appropriate reliance” is easy to state, hard to measure — operationalizing it requires knowing ground truth, which production settings rarely have in real time.

glossary/jagged-frontier — the tool-side asymmetry; appropriate reliance is the human-side response to it
glossary/ai-skill-leveling — novices gain most and over-rely most; the two findings sit together
glossary/agent-adoption-frictions — the trust/competence/control frictions; calibration is the resolution, not “more trust”
glossary/honest-assessment — the disclosure paradox is honest-assessment at the AI-transparency layer
glossary/hallucination — what appropriate reliance defends against: confident, wrong output
glossary/guardrails — the friction/verification pattern that engineers calibration
glossary/customer-perception-moments — the disclosure-trust effect is a customer-perception finding at the AI-transparency moment
glossary/ai-competitive-analysis — appropriate reliance applied to a domain: how to AI-augment competitive/strategic analysis without over-relying on a single biased run
questions/ai-as-personal-advisor — appropriate reliance applied at the personal scale: the calibration layer of the personal-AI-advisor reliability framework

Sources

Klingbeil, A., Grützner, C., & Schreck, P. (2024). Trust and reliance on AI — An experimental study on the extent and costs of overreliance on AI. Computers in Human Behavior, 160, 108352. DOI: 10.1016/j.chb.2024.108352. Mere AI labeling → over-reliance against self-interest; costs to third parties. [verified 3-0]
Lee, H.-P., et al. (2025). The Impact of Generative AI on Critical Thinking. CHI 2025 (Microsoft Research + Carnegie Mellon). PDF. n=319 workers, 936 tasks; dual confidence effect. Correlational/self-reported. [verified 3-0]
Schilke, O., & Reimann, M. (2025). The Transparency Dilemma: How AI Disclosure Erodes Trust. Organizational Behavior and Human Decision Processes, 188. ScienceDirect · SSRN 5205850. 13 experiments, 5,000+; disclosure → −16–20% trust via reduced legitimacy. [verified 3-0]
Journal of Decision Systems (2025). (Chess-player study, n=529, SEM): appropriate reliance is the target; both over- and under-reliance hurt; experts self-rely. DOI: 10.1080/12460125.2025.2593251. [verified 3-0]
Schemmer, M., et al. (2025). Appropriate Reliance on AI Advice. Journal of Artificial Intelligence Research 82 / arXiv:2304.08804. Formal result: maximizing AI adherence is suboptimal.

By Andrej Ruckij