#waf

11 posts tagged with waf.

June 17, 2026

AI Crawler Access Control: The Complete Guide (2026)

Everything site owners need on AI crawler access: the training/retrieval/user-fetch taxonomy, the block-or-allow decision, how to set up robots.txt, why a WAF enforces where robots.txt asks, llms.txt, costs, and regulation.
June 17, 2026

AI Crawler Access Checklist: 8 Steps for Site Owners

A practical checklist to get AI crawler access right: audit current access, set a robots.txt policy, reconcile your CDN/WAF, verify by IP range, decide on llms.txt, monitor, and refresh quarterly.
June 17, 2026

AI Crawler Tarpits and Honeypots: Nepenthes, Anubis, and Cloudflare AI Labyrinth

When blocking isn't enough, tarpits waste a crawler's resources instead. A look at Nepenthes (infinite maze), Anubis (proof-of-work), and Cloudflare AI Labyrinth — what they do, and their real tradeoffs.
June 17, 2026

Blocking AI Crawlers by IP and ASN (for Stealth Scrapers)

When scrapers spoof user-agents, block by IP and ASN instead. How to use network-level blocking and rate limiting to stop stealth AI crawlers that ignore robots.txt and fake their identity.
June 17, 2026

How to Block AI Scrapers: The Complete Enforcement Guide (2026)

robots.txt won't stop scrapers that ignore it. This is the enforcement layer: WAF rules, bot verification by IP range, IP/ASN blocking, rate limiting, and tarpits — how to actually keep non-compliant AI crawlers out.
June 17, 2026

Perplexity's Crawlers: PerplexityBot, Perplexity-User, and the Stealth-Crawling Controversy

PerplexityBot indexes for citations; Perplexity-User fetches pages users ask about (and ignores robots.txt by design). Plus the August 2025 Cloudflare report that Perplexity crawled sites that blocked it.
June 17, 2026

How to Verify a Real AI Bot (IP Ranges, Reverse DNS)

User-agent strings are spoofable, so verify AI bots by their published IP-range files and reverse DNS — not by name. Here's how to confirm a request really is GPTBot, ClaudeBot, or PerplexityBot.
June 16, 2026

Can robots.txt Stop AI Scrapers?

No. robots.txt only asks compliant bots to stay away — non-compliant AI scrapers ignore it. To actually stop them you need a WAF, IP/ASN blocking, and bot verification at the edge.
June 16, 2026

Does Cloudflare Block AI Crawlers by Default?

Yes. Since July 2025 Cloudflare blocks AI crawlers by default for new sites and offers one-click blocking plus pay-per-crawl. If you're on Cloudflare, check this setting — it can override your robots.txt.
June 16, 2026

Why robots.txt Won't Block AI Bots (and What Actually Does)

robots.txt only asks AI crawlers to stay away — a WAF enforces. Here's why a firewall rule beats robots.txt, why non-compliant scrapers ignore your txt file, and the layered setup that actually controls AI bot access.
June 16, 2026

Which AI Bots Should You Block? (And Why robots.txt Won't Stop Them)

A plain-English guide to AI crawler access: the training vs. retrieval vs. user-fetch bot taxonomy, which to allow or block, and why a firewall enforces where robots.txt only asks.