# Diffbot

> Diffbot · data-aggregation

_The AI Crawler Registry · /crawlers/diffbot · [JSON](/api/crawlers/diffbot) · [all The AI Crawler Registry](/crawlers)_

- **name:** Diffbot
- **operator:** Diffbot
- **purpose:** data-aggregation
- **ua_substring:** Diffbot
- **robots_token:** Diffbot
- **respects_robots:** yes
- **verify:** no operator-published authoritative IP-range file confirmed; verify by user-agent + edge controls. Diffbot documents that Crawlbot adheres to robots.txt by default.
- **notes:** Diffbot's Crawlbot extracts and structures web content into a knowledge graph sold to customers (market intelligence, e-commerce, AI training). Registered as a 'data-provider' (Agents Welcome taxonomy extension). Diffbot documents that crawls adhere to robots.txt (disallow + crawl-delay) by default.
- **canonical_name:** Diffbot
- **user_agent_token:** Diffbot
- **ua_full:** — (verify-against-primary-at-build)
- **bot_type:** data-provider
- **bot_type_extension:** data-provider (Agents Welcome registry extension beyond the cited 6-type set)
- **opt_out_mechanism:** robots.txt disallow (User-agent: Diffbot)
- **published_ip_range_url:** — (verify-against-primary-at-build)
- **asn:** — (verify-against-primary-at-build)
- **reverse_dns_suffix:** — (verify-against-primary-at-build)
- **supports_web_bot_auth:** — (verify-against-primary-at-build)
- **signature_agent_domain:** — (verify-against-primary-at-build)
- **jwks_url:** — (verify-against-primary-at-build)
- **verification_methods:** user-agent-match
- **crawl_traffic_share:** — (verify-against-primary-at-build)
- **targeted_content_type:** HTML, text, structured data
- **documentation_url:** https://docs.diffbot.com/docs/does-crawl-respect-robotstxt
- **first_seen_date:** — (verify-against-primary-at-build)
- **last_verified_date:** 2026-06-15
- **block_vs_allow_recommendation:** conditional — data-provider crawler that structures content for resale (incl. downstream AI training); allow if you want representation in Diffbot's knowledge graph, block via robots.txt to opt out. No direct referral.
- **citation_referral_value:** low (data aggregation for resale; no direct citation or referral)
- **cloudflare_verified_category:** — (verify-against-primary-at-build)
- **status:** active
- **triples:** ["Diffbot","operated_by","Diffbot"], ["Diffbot","has_bot_type","data-provider"], ["Diffbot","verified_via","user-agent-match"]
- **attribute_sources:** {"claims":["user_agent_token","robots_token","respects_robots","documentation_url","opt_out_mechanism"],"source":"https://docs.diffbot.com/docs/does-crawl-respect-robotstxt","last_verified":"2026-06-15"}, {"claims":["operator","bot_type"],"source":"https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json","last_verified":"2026-06-15"}
