AI Crawler Tarpits and Honeypots: Nepenthes, Anubis, and Cloudflare AI Labyrinth
When blocking isn't enough, tarpits waste a crawler's resources instead. A look at Nepenthes (infinite maze), Anubis (proof-of-work), and Cloudflare AI Labyrinth — what they do, and their real tradeoffs.
AI Crawler Tarpits and Honeypots: Nepenthes, Anubis, and Cloudflare AI Labyrinth
By Andrej Ruckij · June 17, 2026
TL;DR: Tarpits fight non-compliant AI crawlers by wasting their resources instead of just blocking them. Nepenthes traps crawlers in an infinite maze of generated pages; Anubis makes crawling expensive with a proof-of-work challenge; Cloudflare AI Labyrinth feeds bad bots decoy AI-generated pages. They’re an aggressive, advanced tier — effective against scrapers that ignore robots.txt, but with real tradeoffs.
A cluster under the enforcement guide. This is the escalation tier: for crawlers that ignore robots.txt and evade simple blocks, some operators go from blocking to trapping.
Why tarpits exist
A plain block returns a 403 and the crawler moves on. A tarpit does something different: it accepts the request and feeds the crawler an endless or expensive experience, burning the scraper’s time, bandwidth, and compute. The goal isn’t to keep the bot out — it’s to make crawling you economically painful, especially for the non-compliant scrapers that a polite Disallow never stops (can-robots-txt-stop-ai-scrapers). A honeypot is the detection cousin: a hidden link or path no human would follow, so anything that hits it is provably a bot and can be banned.
Nepenthes — the infinite maze
Nepenthes (named after the carnivorous pitcher plant) is an open-source tarpit that traps AI training crawlers in an infinitely, randomly generating series of pages with no exit links. A crawler that wanders in “gets stuck” and thrashes, consuming resources for as long as it keeps following links. It’s commonly paired with fail2ban to then IP-ban whatever fell into the trap. Nepenthes is the aggressive end of the spectrum — its stated intent is to waste crawlers’ time and compute, not merely deny them.
Anubis — proof-of-work
Anubis takes a different tack: it’s a proof-of-work challenge — like a CAPTCHA flipped around. Instead of checking that a visitor is human, it forces every client to do a small computation before proceeding, which is trivial for one real visitor but prohibitively expensive at crawl scale. It became widely deployed in 2025, especially by open-source projects and code forges hammered by AI scrapers. Tradeoff: it adds a small challenge step in front of your site, which can affect legitimate users and clients in edge cases.
Cloudflare AI Labyrinth — managed decoys
For those who don’t want to self-host a tarpit, Cloudflare AI Labyrinth (announced March 2025) is the managed version. When it detects inappropriate bot activity, it lures the crawler into a maze of realistic-looking but irrelevant AI-generated pages, wasting its resources rather than blocking outright. It auto-deploys on bad-bot signals, so there’s little to configure — it’s part of Cloudflare’s broader AI-bot controls (does-cloudflare-block-ai-crawlers).
The honest tradeoffs
Tarpits are not a default recommendation — they’re a deliberate escalation with costs:
- They can waste your resources too. Serving an infinite maze means serving requests; a self-hosted tarpit consumes your bandwidth/compute as well.
- Collateral damage. Aggressive challenges or mazes can ensnare or degrade experience for legitimate crawlers and users if misconfigured.
- Arms race. Crawlers adapt; tarpits are a moving target, not a permanent fix.
- Most sites don’t need them. For the majority, the right stack is robots.txt + a WAF + IP/ASN blocking (block-ai-crawler-ip-asn). Tarpits make sense mainly for sites under genuine, persistent scraper pressure.
Key takeaways
- Tarpits waste a crawler’s resources instead of just blocking it; honeypots detect bots via hidden bait.
- Nepenthes = infinite maze (often + fail2ban); Anubis = proof-of-work challenge; Cloudflare AI Labyrinth = managed decoy maze.
- They work against non-compliant scrapers that ignore robots.txt — but cost your resources, risk collateral damage, and are an arms race.
- Most sites should start with robots.txt + WAF + IP blocking; reserve tarpits for persistent scraper pressure.
Related articles
- how-to-block-ai-scrapers — the parent enforcement guide
- block-ai-crawler-ip-asn — the more common IP/ASN approach
- can-robots-txt-stop-ai-scrapers — why these exist at all
- does-cloudflare-block-ai-crawlers — the managed AI Labyrinth option
- glossary/bytespider — the kind of crawler tarpits target