Skip to content

Bytespider — Definition

Bytespider

TL;DR: Bytespider is ByteDance’s (TikTok’s parent company) web crawler, widely reported to crawl aggressively and disregard robots.txt. It’s the textbook case of a non-compliant AI scraper — the reason robots.txt alone can’t keep bots out.

What it means

Bytespider is the user-agent associated with ByteDance, the company behind TikTok, and is understood to gather web content for AI training. It earned its reputation as the canonical “bad-actor” crawler because it has been repeatedly reported to crawl at high volume and to ignore robots.txt directives that ask it to stop. Whether or not every report is precise, Bytespider has become the standard example used to illustrate a category: crawlers that don’t honor the honor system.

Why it matters

Bytespider is the concrete answer to “why isn’t robots.txt enough?” You can add User-agent: Bytespider / Disallow: / and a compliant bot would obey — but a non-compliant one simply doesn’t, or rotates its identity so your rule never matches. That’s why hard enforcement requires a firewall, IP/ASN blocking, and behavioral rules rather than a polite directive. If a crawler has decided to ignore robots.txt, only edge-level enforcement stops it (see glossary/waf and seo/ai-crawler-access).

How it works / examples

A robots.txt entry for Bytespider is worth adding for the record, but don’t rely on it:

User-agent: Bytespider
Disallow: /

For actual enforcement against Bytespider-class crawlers, block at the WAF/CDN and verify legitimate bots by published IP range — since the user-agent string itself can be spoofed or rotated.

Sources