Should I Allow AI Crawlers?

Allow AI search and user-fetch crawlers — they cite you in AI answers and bring real visitors. Consider blocking only training crawlers, which take content for model training with nothing back.

By Andrej Ruckij · · 2 min read

Should I Allow AI Crawlers?

By Andrej Ruckij · June 16, 2026

TL;DR: Yes — allow AI search/retrieval and user-fetch crawlers; they cite you in AI answers and bring real visitors. The only crawlers worth considering blocking are training bots, which take your content to train models with nothing in return.

The direct answer

For most businesses, the answer is “allow, with one exception.” The three crawler types break down clearly:

  • Allow search/retrieval bots (OAI-SearchBot, PerplexityBot) — they’re how you get cited in ChatGPT and Perplexity answers, an increasingly important discovery channel. Blocking them makes you invisible there.
  • Allow user-fetch bots (ChatGPT-User, Perplexity-User) — these fire when a real person asks an AI to open your page. They’re visitors. Never block them.
  • Decide on training bots (GPTBot, CCBot) — the only genuine question. Block if you object to free model training; allow if you don’t mind.

Why people ask this

The instinct after headlines about AI scraping is to block everything. But a blanket block trades away real, growing value — AI-driven citations and referral traffic, which convert well because users arrive pre-informed — to avoid training that, for many sites, isn’t worth fighting. The asymmetry usually favors allowing, with training as the one deliberate opt-out.

How to apply it

A visibility-first default:

# Allow search + user-fetch (the value)
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /

# Optional: opt out of training only
User-agent: GPTBot
Disallow: /

If AI visibility matters to your business (most ecommerce and lead-gen sites), lean toward allowing — and make sure no CDN default-block is silently overriding it (robots-txt-vs-waf-ai-bots). If your content is your product (publishers, paywalled research), the training-block case is stronger.

Key takeaways

  • Allow search and user-fetch crawlers — they bring citations and visitors.
  • Never block user-fetch bots; those are real people.
  • Training is the only real opt-out decision — weigh it on whether content is your product.

Sources