Content Signals

A robots.txt extension that lets a site declare how its content may be used after access — search, ai-input, and ai-train — as machine-readable preferences.

name: Content Signals
full_name: Cloudflare Content Signals Policy
layer: licensing
creator: Cloudflare
status: live (2025)
year: 2025
one_liner: A robots.txt extension that lets a site declare how its content may be used after access — search, ai-input, and ai-train — as machine-readable preferences.
spec_url: https://blog.cloudflare.com/content-signals-policy/
snippet: # robots.txt Content-Signal: search=yes, ai-input=yes, ai-train=no
abbreviation: Content Signals
also_known_as: Content Signals Policy Content-Signal
canonical_spec_url: https://contentsignals.org
entity_uri: https://blog.cloudflare.com/content-signals-policy/
taxonomy_layer: licensing
sub_layer: content-use-preferences
protocol_type: declaration
central_problem: Lets a site express, in robots.txt, how crawlers may use its content after access — for search, AI input (RAG/grounding), or AI training — as clear machine-readable preferences.
maintainer: Cloudflare (Content Signals Policy; published openly under CC0)
governance_body: vendor (Cloudflare); spec released under CC0
license: CC0 (the policy text is released under a CC0 license for open adoption)
maturity_tag: emerging
current_spec_version: — verify-against-primary-at-build ↗ https://contentsignals.org
spec_date: 2025-09-24
launch_date: 2025-09-24
last_verified: 2026-06-15
transport: robots.txt extension (Content-Signal directive: search / ai-input / ai-train)
core_mechanism: The policy adds a short human-readable block plus a machine-readable Content-Signal line to robots.txt declaring per-use preferences with yes/no values: search (use in a search index), ai-input (use to ground/answer queries, e.g. RAG), and ai-train (use to train models). Cloudflare's managed robots.txt defaults to search=yes, ai-train=no. Signals express preferences, not technical enforcement.
discovery_endpoint: robots.txt Content-Signal directive
settlement_type: —
adoption_metric: Auto-applied to Cloudflare's managed robots.txt (Cloudflare states 3.8M+ domains) with default Content-Signal: search=yes, ai-train=no source
notable_adopters: {"value":"Cloudflare (creator; applied across its managed robots.txt fleet)","source":"https://blog.cloudflare.com/content-signals-policy/"}
relationships: {"predicate":"complements","target":"rsl","note":"Content Signals expresses non-binding use preferences in robots.txt; RSL declares enforceable machine-readable licensing terms — paired Layer-6 declarations."} {"predicate":"extends","target":"agents-json","note":"Content Signals is a robots.txt extension in the same Layer-1/6 declaration family as the other root-file declarations; it adds an after-access use dimension robots.txt's allow/disallow lacks."}
ideal_use_case: A site that wants to state, in robots.txt, that search use is welcome but AI training is not — without standing up enforcement.
when_to_use: When you want to communicate after-access content-use preferences (search vs AI-input vs AI-train) to crawlers in a standard, machine-readable line.
when_not_to_use: When you need binding, enforceable licensing or payment (use RSL or Pay Per Crawl) — Content Signals are preferences, not controls.
code_example: # robots.txt # Content usage preferences (Cloudflare Content Signals Policy) User-agent: * Content-Signal: search=yes, ai-input=yes, ai-train=no Allow: /
source: Launch 2025-09-24, three signals (search / ai-input / ai-train), robots.txt extension, CC0, Cloudflare managed-robots default search=yes/ai-train=no, contentsignals.org: https://blog.cloudflare.com/content-signals-policy/ . Listed for addition in research §2.
agent_readiness_link: access-economics
layer_legacy: content

← all The Agent Protocol Atlas · .md · JSON