Google's AI Crawlers: Google-Extended, Google-CloudVertexBot, and Gemini

Google's AI crawling is confusing because Google-Extended isn't a crawler — it's an opt-out token. Here's how Google-Extended, Google-CloudVertexBot, and Googlebot relate to Gemini training and AI features.

By Andrej Ruckij · · 3 min read

Google’s AI Crawlers: Google-Extended, Google-CloudVertexBot, and Gemini

By Andrej Ruckij · June 17, 2026

TL;DR: Google’s AI access is confusing because the main control, Google-Extended, is not a crawler — it’s an opt-out token that tells Google not to use already-crawled content for training Gemini/Vertex. It makes no requests and has no IP ranges. Google-CloudVertexBot is a real crawler. And blocking Google-Extended does not affect your Google Search ranking (that’s Googlebot).

A cluster under the AI crawler directory. Google is the most-misunderstood vendor here, because its primary AI control behaves unlike every other bot on the list.

Google-Extended is a token, not a crawler

This is the key thing to understand. Google-Extended does not crawl anything. It has no user-agent string of its own and makes no HTTP requests. It’s a robots.txt opt-out signal: adding it tells Google not to use content (that Googlebot already fetched for Search) to train Gemini and Vertex AI.

User-agent: Google-Extended
Disallow: /

What this means in practice:

  • It changes downstream data use, not crawling. Googlebot still crawls and ranks you normally.
  • Blocking Google-Extended has zero effect on your Google Search ranking — a common fear, and an unfounded one.
  • There are no IP ranges to verify, because nothing is making requests.

See glossary/llms-txt-adjacent confusion aside — this opt-out-token-vs-crawler distinction is the single most common Google AI mistake.

Google-CloudVertexBot — the real crawler

Unlike Google-Extended, Google-CloudVertexBot is an actual crawler. It fetches content for Google Cloud Vertex AI Search (typically when a Cloud customer builds a search app over sites they own/are entitled to). It’s controllable via robots.txt with its own token. Most general site owners won’t need to think about it; it matters mainly in Cloud/enterprise contexts.

Where Gemini’s data comes from

Gemini’s training/grounding draws on Google’s broader crawling (governed by Googlebot + the Google-Extended opt-out). Separately — and relevant for ecommerce — Gemini’s product recommendations read your Google Merchant Center feed directly, which is why product-feed quality matters for Gemini (see chatgpt-shopping-is-google-shopping for the cross-engine feed picture).

  • To opt out of Gemini/Vertex training: Disallow: Google-Extended. This won’t hurt Search.
  • Leave Googlebot allowed: blocking it would remove you from Google Search entirely — almost never what you want.
  • Google-CloudVertexBot: decide only if it’s relevant to your Cloud setup.

Key takeaways

  • Google-Extended is an opt-out token, not a crawler — no UA, no requests, no IP ranges.
  • Blocking Google-Extended opts you out of Gemini/Vertex training and does not affect Google Search ranking.
  • Google-CloudVertexBot is a real crawler, mostly relevant in Cloud/enterprise contexts.
  • Gemini reads your Merchant Center feed directly for product recommendations.

Sources