Opt-Out Tokens: Decline AI Training, Keep Search
Opt-out tokens decline AI training per operator without leaving search: Google-Extended, Applebot-Extended and noai opt out of training, not search.
Opt-out tokens decline AI training per operator
Opt-out tokens let you decline AI training on a per-operator basis without leaving search. Google-Extended is a robots.txt user-agent token that, as reported by Google, opts a site out of training Gemini and Vertex AI generative models while keeping normal Google Search indexing in place. Applebot-Extended does the equivalent for Apple Intelligence, leaving Apple's standard Applebot search crawl unaffected. Both are operator-controlled: each is set in your robots.txt but honored at the operator's discretion, so the exact token behavior and current scope must be verified against each operator's primary docs at build (Google Search Central for Google-Extended, Apple's Applebot documentation for Applebot-Extended) — never an internal note.
User-agent: Google-Extended
Disallow: /
User-agent: Applebot-Extended
Disallow: /The training-versus-search distinction is load-bearing
The whole value of an opt-out token is that a training opt-out is not a search block. Telling Google-Extended or Applebot-Extended to Disallow stops generative-model training but leaves the search/retrieval crawler free to index and cite you. Blocking the search crawler instead (for example disallowing the main Googlebot) removes you from that engine's results — and, increasingly, from the AI answers built on them — forfeiting citations and referral traffic. The defensible move for most publishers is to refuse training while keeping the retrieval crawler welcome, which is exactly what these tokens are designed to express.
noai and noimageai signal content-level refusal
The noai and noimageai meta directives signal at the page level that a page's text and images should not be used for AI. They are content-level refusals rather than per-crawler robots.txt rules, and they are adoption-dependent — honored at each operator's discretion, with no edge enforcement behind them. Treat them as a clear statement of intent rather than a hard control.
<meta name="robots" content="noai, noimageai">robots.txt remains the per-crawler opt-out of record
For a hard, per-crawler refusal, a User-agent + Disallow rule in robots.txt remains the opt-out of record — naming a specific AI crawler (for example GPTBot or ClaudeBot) and disallowing it. This is still a compliance-based mechanism (it depends on the crawler obeying robots.txt), but it is the long-standing, widely-honored convention. Per-bot opt-out records and the block-vs-allow rationale for each crawler live in the registry.
Related: Google-Extended and other directives defined · robots.txt-AI blocks AI crawlers per user-agent · each crawler's opt-out mechanism and block-vs-allow rationale · weigh the trade-off in should you block AI? · back to AI access economics.
