robots.txt

The root-level file that tells crawlers — including AI crawlers — what they may and may not fetch.

term
robots.txt
category
protocols
short_def
The root-level file that tells crawlers — including AI crawlers — what they may and may not fetch.
long_def
The web's oldest crawler contract, originally defined by Martijn Koster in 1994 and standardized as RFC 9309 in 2022. In the agentic era it is where sites name AI crawlers explicitly (GPTBot, ClaudeBot, Google-Extended), and where RSL licensing terms are referenced via a License directive.
see_also
ai-crawler rsl llms-txt
etymology_origin
Originally defined by Martijn Koster in 1994 as the Robots Exclusion Protocol; a de facto standard by mid-1994, formally published by the IETF as RFC 9309 (with Koster as an author) in September 2022.
related_to
ai-crawler llms-txt content-negotiation
contrast_with
Unlike llms.txt, which curates what models should read first (inclusion), robots.txt declares what crawlers may NOT fetch (exclusion) — exclusion contract versus ingestion index.
example
RFC 9309 (September 2022) formalized the Robots Exclusion Protocol that Martijn Koster first defined in 1994; sites now name AI crawlers such as GPTBot and Google-Extended in it explicitly.
source
https://www.rfc-editor.org/rfc/rfc9309.html
status
active
why_it_matters
robots.txt is still the front door for crawler access control; in the agentic era it is where a site first decides which AI crawlers it admits, blocks or licenses.
sameAs
https://en.wikipedia.org/wiki/Robots.txt https://www.rfc-editor.org/rfc/rfc9309.html
bridge_entity
crawlers
last_verified
2026-06-15
md_twin
/glossary/robots-txt.md

← all The Agentic Web Lexicon · .md · JSON