How to Verify a Real AI Bot (IP Ranges, Reverse DNS)

User-agent strings are spoofable, so verify AI bots by their published IP-range files and reverse DNS — not by name. Here's how to confirm a request really is GPTBot, ClaudeBot, or PerplexityBot.

By Andrej Ruckij · · 3 min read

How to Verify a Real AI Bot (IP Ranges, Reverse DNS)

By Andrej Ruckij · June 17, 2026

TL;DR: A bot’s user-agent string is self-reported and trivially faked, so never trust the name alone. Verify against the operator’s published IP-range file (OpenAI, Anthropic, and Perplexity each publish one) or via reverse DNS. A request claiming to be GPTBot from outside OpenAI’s published ranges is not GPTBot — and should be treated as a scraper.

A cluster under the enforcement guide. Verification is the foundation of every other enforcement step — block and allow rules are only as trustworthy as your ability to confirm who’s actually knocking.

Why the user-agent isn’t enough

The user-agent string is just an HTTP header the client sets itself. Any scraper can send User-agent: GPTBot while having nothing to do with OpenAI — and stealth crawlers do exactly this to slip past name-based rules (the Perplexity stealth-crawl case is the textbook example). So a robots.txt Disallow: GPTBot or a firewall rule keyed on the name catches honest bots and misses dishonest ones. Verification closes that gap.

Method 1: published IP-range files (best)

The major AI operators publish machine-readable lists of the IP ranges their bots crawl from. Check the requesting IP against the relevant file:

OperatorIP-range file
OpenAI (GPTBot, OAI-SearchBot, ChatGPT-User)openai.com/gptbot.json, openai.com/searchbot.json, openai.com/chatgpt-user.json
Anthropic (ClaudeBot, Claude-SearchBot, Claude-User)claude.com/crawling/bots.json
Perplexity (PerplexityBot, Perplexity-User)perplexity.com/perplexitybot.json, perplexity.com/perplexity-user.json

If a request claims a bot’s user-agent but its source IP isn’t in that operator’s published range, it’s not the real bot. Full token reference: ai-crawler-user-agents-directory.

Method 2: reverse DNS (where ranges aren’t published)

For operators that don’t publish IP ranges (e.g. Common Crawl’s CCBot, some others), use a forward-confirmed reverse DNS check: do a reverse DNS lookup on the requesting IP, confirm it resolves to the expected domain, then forward-resolve that hostname back to confirm it matches the IP. This is the same technique used to verify Googlebot. It’s more work than an IP-range lookup but defeats simple spoofing.

Method 3: behavioral signals (the backstop)

When identity can’t be confirmed, behavior gives it away — request velocity far above a human, sequential crawling of deep URLs, ignoring robots.txt, rotating user-agents or IPs within a session. These don’t prove who a bot is, but they flag that something automated and uncooperative is happening, which is enough to rate-limit or challenge it.

Putting it together

  • For bots you allow (search/user-fetch): verify by IP range so a scraper can’t impersonate them to bypass other rules.
  • For bots you block: name-based robots.txt handles the honest ones; IP/behavioral enforcement handles the rest (block-ai-crawler-ip-asn).
  • Managed shortcut: CDNs like Cloudflare maintain verified-bot lists and do this verification for you (does-cloudflare-block-ai-crawlers) — though they’ve also de-listed operators caught spoofing.

Key takeaways

  • Never trust the user-agent name — it’s spoofable.
  • Verify by published IP-range file first (OpenAI, Anthropic, Perplexity publish them).
  • Use forward-confirmed reverse DNS where ranges aren’t published.
  • Behavioral signals are the backstop for unverifiable bots; CDNs can do verification for you.

Sources