Blocking AI Crawlers by IP and ASN (for Stealth Scrapers)
When scrapers spoof user-agents, block by IP and ASN instead. How to use network-level blocking and rate limiting to stop stealth AI crawlers that ignore robots.txt and fake their identity.
Blocking AI Crawlers by IP and ASN (for Stealth Scrapers)
By Andrej Ruckij · June 17, 2026
TL;DR: When a scraper fakes its user-agent, name-based rules can’t catch it — so block at the network level instead. IP blocking stops a specific address; ASN blocking stops an entire network/host a scraper operates from; rate limiting throttles anything crawling too fast. These run at the firewall/WAF and work regardless of what the bot calls itself.
A cluster under the enforcement guide. This is the layer that handles the crawlers robots.txt and user-agent rules can’t — the ones that lie about who they are.
Why go below the user-agent
Stealth scrapers spoof their user-agent and rotate identities to bypass name-based blocks (verification is how you catch this). Once you know a request isn’t who it claims to be, you need a control that doesn’t rely on the name. The network layer — IP and ASN — is that control: it acts on where the request comes from, which is much harder to fake convincingly at scale than a header string.
IP blocking
The most direct tool: block the specific IP addresses a scraper uses. Effective for a fixed, identifiable source, and the right move for a verified-bad address. Its limit is rotation — a scraper drawing from a large residential or cloud IP pool can cycle addresses faster than you can list them. IP blocking is necessary but rarely sufficient on its own against a determined crawler.
ASN blocking — the wider net
An ASN (Autonomous System Number) identifies the network a range of IPs belongs to — a hosting provider, cloud platform, or ISP. Blocking or challenging traffic from an ASN catches all the addresses a scraper uses within that network, defeating IP rotation that stays inside one host. This is powerful but blunt:
- Good for: scrapers running out of a specific datacenter/cloud ASN with no legitimate human traffic.
- Dangerous for: ASNs that also carry real users (major ISPs, big clouds) — block those and you block customers. Prefer challenge (CAPTCHA/JS) over hard-block on mixed ASNs, and reserve hard ASN blocks for networks that are clearly all-bot.
Rate limiting and behavioral throttling
Often the best first move, because it targets behavior not identity. Cap requests-per-IP-per-window; anything crawling far faster than a human gets throttled or challenged. This naturally catches aggressive crawlers (including the ones driving your costs) without needing to identify them, and it rarely harms real users who don’t make hundreds of requests a minute.
How to combine them
A practical escalation ladder:
- Rate-limit everything — cheap, broad, low collateral.
- Verify suspicious traffic by IP range / reverse DNS (verify-ai-bots); confirm it’s not a real bot impersonated.
- IP-block confirmed-bad addresses.
- ASN-block or challenge if the scraper rotates within a clearly all-bot network.
- Escalate to a tarpit only under persistent pressure (ai-crawler-tarpits).
Managed WAFs (Cloudflare and others) bundle much of this — bot scoring, ASN intelligence, rate rules, verified-bot lists — which is why the infrastructure layer is increasingly where this lives (does-cloudflare-block-ai-crawlers).
A reminder on the limits
Network blocking is enforcement, not magic. Sophisticated scrapers using large residential proxy pools are genuinely hard to stop completely; the realistic goal is to make crawling you expensive and slow enough that you’re not worth it. And none of this touches compliant bots — for those, robots.txt is the right, simpler tool (which-ai-bots-to-block).
Key takeaways
- When user-agents are spoofed, block by IP/ASN and rate-limit — controls that don’t rely on the bot’s name.
- IP blocking is precise but beaten by rotation; ASN blocking is wider but blunt (mind mixed-traffic networks — prefer challenges there).
- Rate limiting targets behavior, catches aggressive crawlers, and rarely harms real users — make it your first move.
- Escalate IP → ASN → tarpit; managed WAFs bundle most of it. Compliant bots still belong in robots.txt.
Related articles
- how-to-block-ai-scrapers — the parent enforcement guide
- verify-ai-bots — confirm identity before you block
- ai-crawler-tarpits — the escalation tier
- can-robots-txt-stop-ai-scrapers — why network blocking is needed at all
- robots-txt-vs-waf-ai-bots — the firewall-enforces principle
Sources
- seo/ai-crawler-access — internal synthesis on non-compliant-crawler enforcement
- Cloudflare — Control content use for AI training