Publishers Are Blocking AI Crawlers: Who, Why, and What It Means for You

By Andrej Ruckij · June 17, 2026

TL;DR: By 2026, roughly 80% of top news sites block AI training bots — a ~300% rise since early 2023. They block because their content is the product and AI crawls broke the old search bargain (training crawlers take thousands of pages per referral they send). Blocking is leverage for licensing deals worth tens of millions. But the publisher playbook rarely fits a business that needs AI visibility — don’t copy it reflexively.

A cluster under the block-vs-allow tradeoff. The publisher wave is the most visible AI-blocking story — and the most misapplied template.

The scale of it

AI-crawler blocking among major publishers went from a fringe move to the default:

~80% of top news sites now block AI training bots; restrictions are up roughly 300% since early 2023.
The Guardian, BBC, Reuters, and the Associated Press lead the blocking.
Reuters and Time now block by default, allowing only approved crawlers via allowlists — a stance also taken by The Atlantic and others.

Why publishers block

The publisher logic is specific to their business model, and it’s sound for them:

Their content is the product. For a news org, the article is the asset. Letting it train a model that then answers the user directly cannibalizes the core business.
AI crawls broke the search bargain. Traditional search sent traffic in exchange for crawling. AI training inverts it — 2026 analyses found training crawlers fetching thousands of pages for every referral they send back. The exchange stopped being mutual.
Blocking is negotiating leverage. A blocked publisher forces AI companies toward paid licensing. It worked: OpenAI signed deals with News Corp, the Associated Press, the Financial Times and others, reportedly in the tens of millions annually. Publishers with open crawl access negotiate from a weaker position.

Why this usually isn’t your playbook

Here’s the part most “publishers are blocking AI!” coverage misses: the publisher case rarely generalizes to a business whose website is marketing rather than product.

If your content markets a product or service, AI visibility is an asset, not a leak. Being cited when a buyer asks AI about your category is exactly what you want.
You’re not negotiating a licensing deal. The leverage rationale doesn’t apply — there’s no tens-of-millions licensing offer for a mid-market SaaS or ecommerce site.
Blocking the wrong bots costs you discovery. For most businesses, the loss from blocking search bots (what-you-lose-blocking-ai-search-bots) outweighs any training concern.

Note the counter-pressure: Microsoft has publicly urged publishers not to block AI bots but to make sites AI-legible instead — a reminder that even within publishing, blocking is contested.

So what should you take from it?

The training/search distinction still holds. Even publishers blocking training often allow search/citation bots. Copy that nuance, not a blanket block. (gptbot-vs-oai-searchbot)
Block training if your content is your product; allow search if your content sells something else.
If you’re big enough to license, blocking is leverage. If you’re not, it’s mostly just lost visibility.

Key takeaways

~80% of top news sites block AI training bots (up ~300% since 2023); Guardian, BBC, Reuters, AP lead.
They block because content is their product, AI broke the search bargain, and blocking is licensing leverage.
That logic rarely fits a business whose site is marketing — for you, AI visibility is usually an asset.
Copy the training-vs-search nuance, not the blanket block.

block-or-allow-ai-crawlers — the parent tradeoff guide
what-you-lose-blocking-ai-search-bots — the cost of over-blocking
ai-crawler-regulation-eu-uk — the legal backdrop to the licensing fights
which-ai-bots-to-block — the practical allow/block policy
glossary/ccbot — Common Crawl, a focal point of publisher disputes