#bot-management

13 posts tagged with bot-management.

June 17, 2026

AI Crawler Access Control: The Complete Guide (2026)

Everything site owners need on AI crawler access: the training/retrieval/user-fetch taxonomy, the block-or-allow decision, how to set up robots.txt, why a WAF enforces where robots.txt asks, llms.txt, costs, and regulation.
June 17, 2026

The AI Crawler Directory: Every User-Agent, What It Does, Allow or Block (2026)

A complete reference table of AI crawler user-agents — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Bytespider and more — with each bot's job, robots.txt compliance, IP-range file, and an allow/block recommendation.
June 17, 2026

Anthropic's Crawlers: ClaudeBot, Claude-SearchBot, Claude-User

Anthropic runs ClaudeBot (training), Claude-SearchBot (search), and Claude-User (user-fetch). All respect robots.txt, and — updated in 2026 — Anthropic now publishes IP ranges for all three.
June 17, 2026

Blocking AI Crawlers by IP and ASN (for Stealth Scrapers)

When scrapers spoof user-agents, block by IP and ASN instead. How to use network-level blocking and rate limiting to stop stealth AI crawlers that ignore robots.txt and fake their identity.
June 17, 2026

How to Block AI Scrapers: The Complete Enforcement Guide (2026)

robots.txt won't stop scrapers that ignore it. This is the enforcement layer: WAF rules, bot verification by IP range, IP/ASN blocking, rate limiting, and tarpits — how to actually keep non-compliant AI crawlers out.
June 17, 2026

Meta and Amazon AI Crawlers: Meta-ExternalAgent, Meta-ExternalFetcher, Amazonbot

Meta runs Meta-ExternalAgent (training) and Meta-ExternalFetcher (user-fetch); Amazon runs Amazonbot, with a search-only sibling Amzn-SearchBot that's easy to confuse. How to handle each.
June 17, 2026

OpenAI's Crawlers: GPTBot, OAI-SearchBot, ChatGPT-User (and OAI-AdsBot)

OpenAI runs separate bots for separate jobs: GPTBot (training), OAI-SearchBot (search), ChatGPT-User (user-fetch), and OAI-AdsBot. Here's what each does, whether to block it, and how to verify it by IP range.
June 17, 2026

Perplexity's Crawlers: PerplexityBot, Perplexity-User, and the Stealth-Crawling Controversy

PerplexityBot indexes for citations; Perplexity-User fetches pages users ask about (and ignores robots.txt by design). Plus the August 2025 Cloudflare report that Perplexity crawled sites that blocked it.
June 17, 2026

How to Verify a Real AI Bot (IP Ranges, Reverse DNS)

User-agent strings are spoofable, so verify AI bots by their published IP-range files and reverse DNS — not by name. Here's how to confirm a request really is GPTBot, ClaudeBot, or PerplexityBot.
June 16, 2026

Can robots.txt Stop AI Scrapers?

No. robots.txt only asks compliant bots to stay away — non-compliant AI scrapers ignore it. To actually stop them you need a WAF, IP/ASN blocking, and bot verification at the edge.
June 16, 2026

Do AI Crawlers Respect robots.txt?

Some do, many don't. Reputable AI crawlers like GPTBot, ClaudeBot, and PerplexityBot honor robots.txt; non-compliant scrapers ignore it. robots.txt is a request, not enforcement.
June 16, 2026

Why robots.txt Won't Block AI Bots (and What Actually Does)

robots.txt only asks AI crawlers to stay away — a WAF enforces. Here's why a firewall rule beats robots.txt, why non-compliant scrapers ignore your txt file, and the layered setup that actually controls AI bot access.
June 16, 2026

Which AI Bots Should You Block? (And Why robots.txt Won't Stop Them)

A plain-English guide to AI crawler access: the training vs. retrieval vs. user-fetch bot taxonomy, which to allow or block, and why a firewall enforces where robots.txt only asks.