Can robots.txt Stop AI Scrapers?

No. robots.txt only asks compliant bots to stay away — non-compliant AI scrapers ignore it. To actually stop them you need a WAF, IP/ASN blocking, and bot verification at the edge.

By Andrej Ruckij · · 2 min read

Can robots.txt Stop AI Scrapers?

By Andrej Ruckij · June 16, 2026

TL;DR: No. robots.txt only asks compliant bots to stay away. Non-compliant AI scrapers — the ones that spoof user-agents and rotate IPs — ignore it entirely. To actually stop them you need a firewall (WAF), IP/ASN blocking, and bot verification at the edge.

The direct answer

robots.txt cannot stop a determined scraper. It’s a voluntary standard — the spec itself says compliance “does not constitute access control.” A reputable crawler reads it and obeys; a scraper built to harvest content reads it and ignores it, or never requests it at all. Nothing about the file prevents a request from reaching your server. It’s a sign, not a lock.

Why people ask this

Because almost every “block AI bots” tutorial ends at a robots.txt snippet, leaving the impression that the job is done. It isn’t. The scrapers most worth stopping — aggressive, unbranded, or impersonating other bots (the Bytespider archetype) — are precisely the ones that don’t honor robots.txt. Believing the file is keeping them out is worse than knowing it isn’t, because you stop looking for the real fix.

What actually stops a scraper

  1. A WAF / firewall rule — returns a 403 at the edge before the request reaches your origin. This is enforcement; robots.txt is not.
  2. IP / ASN blocking — blocks the networks scrapers crawl from, catching them even when they fake their user-agent.
  3. Bot verification — confirm legitimate bots against the operator’s published IP ranges; treat unverifiable claimants as suspect.
  4. Behavioral defenses — rate limits, and for the aggressive end, tarpits/honeypots that waste a scraper’s resources.

A managed option like Cloudflare’s AI Crawl Control bundles much of this at the CDN layer — which is increasingly where this control lives.

Key takeaways

  • robots.txt governs only bots that choose to comply — scrapers don’t.
  • Real enforcement happens at the firewall/edge, not in a text file.
  • Verify bots by IP range, since user-agent names are easily spoofed.

Sources