# robots.txt

> The root-level file that tells crawlers — including AI crawlers — what they may and may not fetch.

_The Agentic Web Lexicon · /glossary/robots-txt · [JSON](/api/glossary/robots-txt) · [all The Agentic Web Lexicon](/glossary)_

- **term:** robots.txt
- **category:** protocols
- **short_def:** The root-level file that tells crawlers — including AI crawlers — what they may and may not fetch.
- **long_def:** The web's oldest crawler contract, originally defined by Martijn Koster in 1994 and standardized as RFC 9309 in 2022. In the agentic era it is where sites name AI crawlers explicitly (GPTBot, ClaudeBot, Google-Extended), and where RSL licensing terms are referenced via a License directive.
- **see_also:** ai-crawler, rsl, llms-txt
- **etymology_origin:** Originally defined by Martijn Koster in 1994 as the Robots Exclusion Protocol; a de facto standard by mid-1994, formally published by the IETF as RFC 9309 (with Koster as an author) in September 2022.
- **related_to:** ai-crawler, llms-txt, content-negotiation
- **contrast_with:** Unlike llms.txt, which curates what models should read first (inclusion), robots.txt declares what crawlers may NOT fetch (exclusion) — exclusion contract versus ingestion index.
- **example:** RFC 9309 (September 2022) formalized the Robots Exclusion Protocol that Martijn Koster first defined in 1994; sites now name AI crawlers such as GPTBot and Google-Extended in it explicitly.
- **source:** https://www.rfc-editor.org/rfc/rfc9309.html
- **status:** active
- **why_it_matters:** robots.txt is still the front door for crawler access control; in the agentic era it is where a site first decides which AI crawlers it admits, blocks or licenses.
- **sameAs:** https://en.wikipedia.org/wiki/Robots.txt, https://www.rfc-editor.org/rfc/rfc9309.html
- **bridge_entity:** crawlers
- **last_verified:** 2026-06-15
- **md_twin:** /glossary/robots-txt.md
