ICC-Crawler

NICT (National Institute of Information and Communications Technology) · training. Crawls data to train and support AI technologies; NICT (Japan) uses the collected data for AI and may provide it to third parties, including commercial companies. Token and operator recorded in the ai.robots.txt machine-readable registry.

name: ICC-Crawler
operator: NICT (National Institute of Information and Communications Technology)
purpose: training
ua_substring: ICC-Crawler
robots_token: ICC-Crawler
respects_robots: yes
verify: verify by user-agent + edge controls; the ai.robots.txt registry records respects-robots = Yes. No operator-published IP-range file confirmed.
notes: Crawls data to train and support AI technologies; NICT (Japan) uses the collected data for AI and may provide it to third parties, including commercial companies. Token and operator recorded in the ai.robots.txt machine-readable registry.
canonical_name: ICC-Crawler
user_agent_token: ICC-Crawler
ua_full: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
bot_type: training
bot_type_extension: —
opt_out_mechanism: robots.txt disallow (User-agent: ICC-Crawler)
published_ip_range_url: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
asn: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
reverse_dns_suffix: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
supports_web_bot_auth: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
signature_agent_domain: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
jwks_url: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
verification_methods: user-agent-match
crawl_traffic_share: — verify-against-primary-at-build ↗ https://radar.cloudflare.com/bots
targeted_content_type: HTML, text
documentation_url: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
first_seen_date: — verify-against-primary-at-build ↗ https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json
last_verified_date: 2026-06-15
block_vs_allow_recommendation: conditional — research/training crawler that may share collected data with third parties incl. commercial companies; allow to be represented, block via robots.txt to opt out. No direct referral.
citation_referral_value: low (training/data collection; no direct citation or referral)
cloudflare_verified_category: — verify-against-primary-at-build ↗ https://radar.cloudflare.com/bots/directory/icc-crawler
status: active
triples: ["ICC-Crawler","operated_by","NICT"] ["ICC-Crawler","has_bot_type","training"] ["ICC-Crawler","verified_via","user-agent-match"]
attribute_sources: {"claims":["user_agent_token","robots_token","operator","respects_robots","purpose"],"source":"https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json","last_verified":"2026-06-15"}

last verified 15 Jun 2026 · by Özden Erdinc

← all The AI Crawler Registry · .md · JSON