How to Verify a Real AI Bot (IP Ranges, Reverse DNS)
User-agent strings are spoofable, so verify AI bots by their published IP-range files and reverse DNS — not by name. Here's how to confirm a request really is GPTBot, ClaudeBot, or PerplexityBot.
How to Verify a Real AI Bot (IP Ranges, Reverse DNS)
By Andrej Ruckij · June 17, 2026
TL;DR: A bot’s user-agent string is self-reported and trivially faked, so never trust the name alone. Verify against the operator’s published IP-range file (OpenAI, Anthropic, and Perplexity each publish one) or via reverse DNS. A request claiming to be
GPTBotfrom outside OpenAI’s published ranges is not GPTBot — and should be treated as a scraper.
A cluster under the enforcement guide. Verification is the foundation of every other enforcement step — block and allow rules are only as trustworthy as your ability to confirm who’s actually knocking.
Why the user-agent isn’t enough
The user-agent string is just an HTTP header the client sets itself. Any scraper can send User-agent: GPTBot while having nothing to do with OpenAI — and stealth crawlers do exactly this to slip past name-based rules (the Perplexity stealth-crawl case is the textbook example). So a robots.txt Disallow: GPTBot or a firewall rule keyed on the name catches honest bots and misses dishonest ones. Verification closes that gap.
Method 1: published IP-range files (best)
The major AI operators publish machine-readable lists of the IP ranges their bots crawl from. Check the requesting IP against the relevant file:
| Operator | IP-range file |
|---|---|
| OpenAI (GPTBot, OAI-SearchBot, ChatGPT-User) | openai.com/gptbot.json, openai.com/searchbot.json, openai.com/chatgpt-user.json |
| Anthropic (ClaudeBot, Claude-SearchBot, Claude-User) | claude.com/crawling/bots.json |
| Perplexity (PerplexityBot, Perplexity-User) | perplexity.com/perplexitybot.json, perplexity.com/perplexity-user.json |
If a request claims a bot’s user-agent but its source IP isn’t in that operator’s published range, it’s not the real bot. Full token reference: ai-crawler-user-agents-directory.
Method 2: reverse DNS (where ranges aren’t published)
For operators that don’t publish IP ranges (e.g. Common Crawl’s CCBot, some others), use a forward-confirmed reverse DNS check: do a reverse DNS lookup on the requesting IP, confirm it resolves to the expected domain, then forward-resolve that hostname back to confirm it matches the IP. This is the same technique used to verify Googlebot. It’s more work than an IP-range lookup but defeats simple spoofing.
Method 3: behavioral signals (the backstop)
When identity can’t be confirmed, behavior gives it away — request velocity far above a human, sequential crawling of deep URLs, ignoring robots.txt, rotating user-agents or IPs within a session. These don’t prove who a bot is, but they flag that something automated and uncooperative is happening, which is enough to rate-limit or challenge it.
Putting it together
- For bots you allow (search/user-fetch): verify by IP range so a scraper can’t impersonate them to bypass other rules.
- For bots you block: name-based robots.txt handles the honest ones; IP/behavioral enforcement handles the rest (block-ai-crawler-ip-asn).
- Managed shortcut: CDNs like Cloudflare maintain verified-bot lists and do this verification for you (does-cloudflare-block-ai-crawlers) — though they’ve also de-listed operators caught spoofing.
Key takeaways
- Never trust the user-agent name — it’s spoofable.
- Verify by published IP-range file first (OpenAI, Anthropic, Perplexity publish them).
- Use forward-confirmed reverse DNS where ranges aren’t published.
- Behavioral signals are the backstop for unverifiable bots; CDNs can do verification for you.
Related articles
- how-to-block-ai-scrapers — the parent enforcement guide
- ai-crawler-user-agents-directory — the IP-range files per vendor
- block-ai-crawler-ip-asn — acting on verification with IP/ASN rules
- robots-txt-vs-waf-ai-bots — why name-based rules aren’t enforcement
- perplexity-crawlers — the stealth/spoofing case study
Sources
- OpenAI — Bots / Crawlers documentation
- Anthropic — crawler documentation
- seo/ai-crawler-access — internal synthesis