robots.txt

The root-level file that tells crawlers — including AI crawlers — what they may and may not fetch.

robots.txt is still the front door for crawler access control; in the agentic era it is where a site first decides which AI crawlers it admits, blocks or licenses.

term: robots.txt
category: protocols
short_def: The root-level file that tells crawlers — including AI crawlers — what they may and may not fetch.
long_def: The web's oldest crawler contract, originally defined by Martijn Koster in 1994 and standardized as RFC 9309 in 2022. In the agentic era it is where sites name AI crawlers explicitly (GPTBot, ClaudeBot, Google-Extended), and where RSL licensing terms are referenced via a License directive.
see_also: ai-crawler rsl llms-txt
etymology_origin: Originally defined by Martijn Koster in 1994 as the Robots Exclusion Protocol; a de facto standard by mid-1994, formally published by the IETF as RFC 9309 (with Koster as an author) in September 2022.
related_to: ai-crawler llms-txt content-negotiation
contrast_with: Unlike llms.txt, which curates what models should read first (inclusion), robots.txt declares what crawlers may NOT fetch (exclusion) — exclusion contract versus ingestion index.
example: RFC 9309 (September 2022) formalized the Robots Exclusion Protocol that Martijn Koster first defined in 1994; sites now name AI crawlers such as GPTBot and Google-Extended in it explicitly.
source: https://www.rfc-editor.org/rfc/rfc9309.html
status: active
why_it_matters: robots.txt is still the front door for crawler access control; in the agentic era it is where a site first decides which AI crawlers it admits, blocks or licenses.
sameAs: https://en.wikipedia.org/wiki/Robots.txt https://www.rfc-editor.org/rfc/rfc9309.html
bridge_entity: crawlers
last_verified: 2026-06-15
md_twin: /glossary/robots-txt.md

last verified 15 Jun 2026 · by Özden Erdinc

← all The Agentic Web Lexicon · .md · JSON