# AI Crawler

> An automated bot that fetches web content for an AI system — to train a model, build a search index, or answer a user's question in real time.

_The Agentic Web Lexicon · /glossary/ai-crawler · [JSON](/api/glossary/ai-crawler) · [all The Agentic Web Lexicon](/glossary)_

- **term:** AI Crawler
- **category:** core
- **short_def:** An automated bot that fetches web content for an AI system — to train a model, build a search index, or answer a user's question in real time.
- **long_def:** AI crawlers split by purpose (training vs search vs inference) and by behavior (whether they honor robots.txt). Their user-agent strings are spoofable, so genuine ones are confirmed via published IP ranges or reverse DNS — and increasingly via Web Bot Auth signatures.
- **see_also:** robots-txt, web-bot-auth, agent-identity
- **etymology_origin:** — (verify-against-primary-at-build)
- **related_to:** robots-txt, web-bot-auth, agent-identity, agentic-web
- **contrast_with:** Unlike a traditional search crawler such as classic Googlebot, an AI crawler fetches content to train models or to ground a live answer — and a growing share (inference fetchers) act per user query rather than on a scheduled index crawl.
- **example:** Per Cloudflare Radar (May 2026), AI crawlers by crawl share included GPTBot (11.48%), Bytespider (10.25%), Applebot (7.01%) and the new Claude-SearchBot (2.22%).
- **source:** https://radar.cloudflare.com/
- **status:** active
- **why_it_matters:** AI crawlers are the agents most sites encounter first; knowing which one is which — and verifying it — is the entry point to every access, licensing and citation decision.
- **sameAs:** https://en.wikipedia.org/wiki/Web_crawler
- **bridge_entity:** crawlers
- **last_verified:** 2026-06-15
- **md_twin:** /glossary/ai-crawler.md
