Skip to content

Select theme

Welcome
Marketing
SEO
Competitor analysis
- The Empty Paid-Social Lane in DNA-Personalized Beauty (2026 Market Note)
- Competitor Analysis in 2026 — The Operational Approach
Automation
Tools
Glossary
Comparisons
Cases
Experiments
Questions

Select theme

On this page

Overview

On this page

Overview

Pages tagged "ai-crawlers"

8 pages tagged with ai-crawlers. ← all tags

AI Crawler — Definition An AI crawler is an automated bot that fetches web content for an AI system — to train a model, build a citation index for AI search, or fetch a page a user asked about. The three types determine your access policy.
AI Crawler Access Control — Bot Taxonomy, robots.txt vs WAF How to decide which AI bots to allow or block: the training / retrieval / user-fetch taxonomy, why a WAF enforces where robots.txt only requests, and the current (2026) user-agent strings for OpenAI, Anthropic, Google, Perplexity, and Meta crawlers.
Bytespider — Definition Bytespider is ByteDance's (TikTok's parent) web crawler, widely reported to ignore robots.txt and crawl aggressively. It's the canonical example of why robots.txt alone can't stop a non-compliant AI scraper.
CCBot (Common Crawl) — Definition CCBot is Common Crawl's web crawler. Common Crawl is a nonprofit that publishes a free, open archive of the web — and that archive is a major training-data source for many AI models. CCBot respects robots.txt.
GPTBot — Definition GPTBot is OpenAI's web crawler that collects content to train its models. It respects robots.txt, publishes its IP ranges, and is distinct from OAI-SearchBot (search) and ChatGPT-User (user-fetch).
llms.txt — Definition llms.txt is a proposed plain-text/markdown file that gives AI systems a curated map of your site's most important content. It's advisory — it helps comprehension, not access control.
Pay-Per-Crawl — Definition Pay-per-crawl is Cloudflare's model that lets sites charge AI crawlers for access using the HTTP 402 'Payment Required' status code and crawler-price headers — turning bot access into a transaction instead of a free-for-all.
WAF (Web Application Firewall) — Definition A WAF is a firewall that inspects and blocks web requests at the edge before they reach your server. For AI bots it's the enforcement layer robots.txt isn't — it acts, robots.txt only asks.