Do AI Crawlers Respect robots.txt?

Some do, many don't. Reputable AI crawlers like GPTBot, ClaudeBot, and PerplexityBot honor robots.txt; non-compliant scrapers ignore it. robots.txt is a request, not enforcement.

By Andrej Ruckij · · 2 min read

Do AI Crawlers Respect robots.txt?

By Andrej Ruckij · June 16, 2026

TL;DR: Some do, many don’t. Reputable AI crawlers from OpenAI, Anthropic, Google, and Perplexity honor robots.txt directives. But robots.txt is a voluntary request, not a barrier — non-compliant scrapers simply ignore it, and only a firewall can stop those.

The direct answer

AI crawlers respect robots.txt only if their operator chooses to. robots.txt is a voluntary standard — the spec (RFC 9309) states outright that compliance “does not constitute access control.” So the answer splits cleanly in two:

  • Compliant crawlers honor it. OpenAI’s GPTBot, Anthropic’s ClaudeBot, Google’s bots, and PerplexityBot read robots.txt and follow Disallow rules.
  • Non-compliant scrapers ignore it. Crawlers that spoof user-agents, rotate IPs, or simply don’t fetch the file will crawl regardless of what it says.

Why this matters

Most “block AI bots” guides imply robots.txt is a lock. It isn’t — it’s a sign asking polite visitors to stay out. That’s fine for the reputable AI companies, which do behave. The problem is that the crawlers you’d most want to block are often the ones least likely to comply. If your goal is “fewer of my words in training sets, from companies that play fair,” robots.txt delivers. If your goal is “keep this content out, no exceptions,” it can’t do that alone.

What to do about it

  1. Use robots.txt for the compliant majority — block training bots (GPTBot, CCBot, ClaudeBot), allow search and user-fetch bots.
  2. Add a WAF / firewall layer for the rest — non-compliant scrapers are only stopped at the edge with a 403, plus IP/ASN blocking.
  3. Verify, don’t trust the name — user-agent strings are spoofable, so confirm a bot against its operator’s published IP ranges.

A notable real case: in August 2025 Cloudflare reported Perplexity crawling sites that had blocked it, by rotating user-agents — a reminder that even a named, reputable bot’s compliance isn’t guaranteed, and enforcement is a firewall job.

Key takeaways

  • robots.txt is advisory; compliance is the bot operator’s choice.
  • Reputable AI crawlers honor it; non-compliant scrapers don’t.
  • For real enforcement, pair robots.txt with a WAF and IP-range verification.

Sources