# CCBot

> Common Crawl · training

_The AI Crawler Registry · /crawlers/ccbot · [JSON](/api/crawlers/ccbot) · [all The AI Crawler Registry](/crawlers)_

- **name:** CCBot
- **operator:** Common Crawl
- **purpose:** training
- **ua_substring:** CCBot
- **robots_token:** CCBot
- **respects_robots:** yes
- **verify:** Common Crawl publishes its crawler IP ranges
- **notes:** Builds the open Common Crawl corpus that many model trainers ingest downstream. Blocking CCBot blocks an upstream training-data source for the whole ecosystem.
- **canonical_name:** CCBot
- **user_agent_token:** CCBot
- **ua_full:** CCBot/2.0 (https://commoncrawl.org/faq/) (source: https://commoncrawl.org/ccbot)
- **bot_type:** training
- **bot_type_extension:** —
- **opt_out_mechanism:** robots.txt disallow (User-agent: CCBot)
- **published_ip_range_url:** https://index.commoncrawl.org/ccbot.json
- **asn:** — (verify-against-primary-at-build)
- **reverse_dns_suffix:** .crawl.commoncrawl.org (source: https://commoncrawl.org/ccbot)
- **supports_web_bot_auth:** — (verify-against-primary-at-build)
- **signature_agent_domain:** — (verify-against-primary-at-build)
- **jwks_url:** — (verify-against-primary-at-build)
- **verification_methods:** published-IP-range, reverse-DNS
- **crawl_traffic_share:** — (verify-against-primary-at-build)
- **targeted_content_type:** HTML, text
- **documentation_url:** https://commoncrawl.org/ccbot
- **first_seen_date:** — (verify-against-primary-at-build)
- **last_verified_date:** 2026-06-15
- **block_vs_allow_recommendation:** conditional — upstream open-corpus crawler; allowing it feeds many downstream trainers (broad reach), blocking removes you from the Common Crawl corpus. No direct referral.
- **citation_referral_value:** low (open training corpus; no direct citation or referral)
- **cloudflare_verified_category:** — (verify-against-primary-at-build)
- **status:** active
- **triples:** ["CCBot","operated_by","Common Crawl"], ["CCBot","has_bot_type","training"], ["CCBot","verified_via","published-IP-range"], ["CCBot","verified_via","reverse-DNS"]
- **attribute_sources:** {"claims":["ua_full","user_agent_token","robots_token","published_ip_range_url","reverse_dns_suffix","documentation_url"],"source":"https://commoncrawl.org/ccbot","last_verified":"2026-06-15"}
