#bot-management
13 posts tagged with bot-management.
-
AI Crawler Access Control: The Complete Guide (2026)
Everything site owners need on AI crawler access: the training/retrieval/user-fetch taxonomy, the block-or-allow decision, how to set up robots.txt, why a WAF enforces where robots.txt asks, llms.txt, costs, and regulation.
-
The AI Crawler Directory: Every User-Agent, What It Does, Allow or Block (2026)
A complete reference table of AI crawler user-agents — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Bytespider and more — with each bot's job, robots.txt compliance, IP-range file, and an allow/block recommendation.
-
Anthropic's Crawlers: ClaudeBot, Claude-SearchBot, Claude-User
Anthropic runs ClaudeBot (training), Claude-SearchBot (search), and Claude-User (user-fetch). All respect robots.txt, and — updated in 2026 — Anthropic now publishes IP ranges for all three.
-
Blocking AI Crawlers by IP and ASN (for Stealth Scrapers)
When scrapers spoof user-agents, block by IP and ASN instead. How to use network-level blocking and rate limiting to stop stealth AI crawlers that ignore robots.txt and fake their identity.
-
How to Block AI Scrapers: The Complete Enforcement Guide (2026)
robots.txt won't stop scrapers that ignore it. This is the enforcement layer: WAF rules, bot verification by IP range, IP/ASN blocking, rate limiting, and tarpits — how to actually keep non-compliant AI crawlers out.
-
Meta and Amazon AI Crawlers: Meta-ExternalAgent, Meta-ExternalFetcher, Amazonbot
Meta runs Meta-ExternalAgent (training) and Meta-ExternalFetcher (user-fetch); Amazon runs Amazonbot, with a search-only sibling Amzn-SearchBot that's easy to confuse. How to handle each.
-
OpenAI's Crawlers: GPTBot, OAI-SearchBot, ChatGPT-User (and OAI-AdsBot)
OpenAI runs separate bots for separate jobs: GPTBot (training), OAI-SearchBot (search), ChatGPT-User (user-fetch), and OAI-AdsBot. Here's what each does, whether to block it, and how to verify it by IP range.
-
Perplexity's Crawlers: PerplexityBot, Perplexity-User, and the Stealth-Crawling Controversy
PerplexityBot indexes for citations; Perplexity-User fetches pages users ask about (and ignores robots.txt by design). Plus the August 2025 Cloudflare report that Perplexity crawled sites that blocked it.
-
How to Verify a Real AI Bot (IP Ranges, Reverse DNS)
User-agent strings are spoofable, so verify AI bots by their published IP-range files and reverse DNS — not by name. Here's how to confirm a request really is GPTBot, ClaudeBot, or PerplexityBot.
-
Can robots.txt Stop AI Scrapers?
No. robots.txt only asks compliant bots to stay away — non-compliant AI scrapers ignore it. To actually stop them you need a WAF, IP/ASN blocking, and bot verification at the edge.
-
Do AI Crawlers Respect robots.txt?
Some do, many don't. Reputable AI crawlers like GPTBot, ClaudeBot, and PerplexityBot honor robots.txt; non-compliant scrapers ignore it. robots.txt is a request, not enforcement.
-
Why robots.txt Won't Block AI Bots (and What Actually Does)
robots.txt only asks AI crawlers to stay away — a WAF enforces. Here's why a firewall rule beats robots.txt, why non-compliant scrapers ignore your txt file, and the layered setup that actually controls AI bot access.
-
Which AI Bots Should You Block? (And Why robots.txt Won't Stop Them)
A plain-English guide to AI crawler access: the training vs. retrieval vs. user-fetch bot taxonomy, which to allow or block, and why a firewall enforces where robots.txt only asks.