#waf
11 posts tagged with waf.
-
AI Crawler Access Control: The Complete Guide (2026)
Everything site owners need on AI crawler access: the training/retrieval/user-fetch taxonomy, the block-or-allow decision, how to set up robots.txt, why a WAF enforces where robots.txt asks, llms.txt, costs, and regulation.
-
AI Crawler Access Checklist: 8 Steps for Site Owners
A practical checklist to get AI crawler access right: audit current access, set a robots.txt policy, reconcile your CDN/WAF, verify by IP range, decide on llms.txt, monitor, and refresh quarterly.
-
AI Crawler Tarpits and Honeypots: Nepenthes, Anubis, and Cloudflare AI Labyrinth
When blocking isn't enough, tarpits waste a crawler's resources instead. A look at Nepenthes (infinite maze), Anubis (proof-of-work), and Cloudflare AI Labyrinth — what they do, and their real tradeoffs.
-
Blocking AI Crawlers by IP and ASN (for Stealth Scrapers)
When scrapers spoof user-agents, block by IP and ASN instead. How to use network-level blocking and rate limiting to stop stealth AI crawlers that ignore robots.txt and fake their identity.
-
How to Block AI Scrapers: The Complete Enforcement Guide (2026)
robots.txt won't stop scrapers that ignore it. This is the enforcement layer: WAF rules, bot verification by IP range, IP/ASN blocking, rate limiting, and tarpits — how to actually keep non-compliant AI crawlers out.
-
Perplexity's Crawlers: PerplexityBot, Perplexity-User, and the Stealth-Crawling Controversy
PerplexityBot indexes for citations; Perplexity-User fetches pages users ask about (and ignores robots.txt by design). Plus the August 2025 Cloudflare report that Perplexity crawled sites that blocked it.
-
How to Verify a Real AI Bot (IP Ranges, Reverse DNS)
User-agent strings are spoofable, so verify AI bots by their published IP-range files and reverse DNS — not by name. Here's how to confirm a request really is GPTBot, ClaudeBot, or PerplexityBot.
-
Can robots.txt Stop AI Scrapers?
No. robots.txt only asks compliant bots to stay away — non-compliant AI scrapers ignore it. To actually stop them you need a WAF, IP/ASN blocking, and bot verification at the edge.
-
Does Cloudflare Block AI Crawlers by Default?
Yes. Since July 2025 Cloudflare blocks AI crawlers by default for new sites and offers one-click blocking plus pay-per-crawl. If you're on Cloudflare, check this setting — it can override your robots.txt.
-
Why robots.txt Won't Block AI Bots (and What Actually Does)
robots.txt only asks AI crawlers to stay away — a WAF enforces. Here's why a firewall rule beats robots.txt, why non-compliant scrapers ignore your txt file, and the layered setup that actually controls AI bot access.
-
Which AI Bots Should You Block? (And Why robots.txt Won't Stop Them)
A plain-English guide to AI crawler access: the training vs. retrieval vs. user-fetch bot taxonomy, which to allow or block, and why a firewall enforces where robots.txt only asks.