# ICC-Crawler

> NICT (National Institute of Information and Communications Technology) · training

_The AI Crawler Registry · /crawlers/icc-crawler · [JSON](/api/crawlers/icc-crawler) · [all The AI Crawler Registry](/crawlers)_

- **name:** ICC-Crawler
- **operator:** NICT (National Institute of Information and Communications Technology)
- **purpose:** training
- **ua_substring:** ICC-Crawler
- **robots_token:** ICC-Crawler
- **respects_robots:** yes
- **verify:** verify by user-agent + edge controls; the ai.robots.txt registry records respects-robots = Yes. No operator-published IP-range file confirmed.
- **notes:** Crawls data to train and support AI technologies; NICT (Japan) uses the collected data for AI and may provide it to third parties, including commercial companies. Token and operator recorded in the ai.robots.txt machine-readable registry.
- **canonical_name:** ICC-Crawler
- **user_agent_token:** ICC-Crawler
- **ua_full:** — (verify-against-primary-at-build)
- **bot_type:** training
- **bot_type_extension:** —
- **opt_out_mechanism:** robots.txt disallow (User-agent: ICC-Crawler)
- **published_ip_range_url:** — (verify-against-primary-at-build)
- **asn:** — (verify-against-primary-at-build)
- **reverse_dns_suffix:** — (verify-against-primary-at-build)
- **supports_web_bot_auth:** — (verify-against-primary-at-build)
- **signature_agent_domain:** — (verify-against-primary-at-build)
- **jwks_url:** — (verify-against-primary-at-build)
- **verification_methods:** user-agent-match
- **crawl_traffic_share:** — (verify-against-primary-at-build)
- **targeted_content_type:** HTML, text
- **documentation_url:** — (verify-against-primary-at-build)
- **first_seen_date:** — (verify-against-primary-at-build)
- **last_verified_date:** 2026-06-15
- **block_vs_allow_recommendation:** conditional — research/training crawler that may share collected data with third parties incl. commercial companies; allow to be represented, block via robots.txt to opt out. No direct referral.
- **citation_referral_value:** low (training/data collection; no direct citation or referral)
- **cloudflare_verified_category:** — (verify-against-primary-at-build)
- **status:** active
- **triples:** ["ICC-Crawler","operated_by","NICT"], ["ICC-Crawler","has_bot_type","training"], ["ICC-Crawler","verified_via","user-agent-match"]
- **attribute_sources:** {"claims":["user_agent_token","robots_token","operator","respects_robots","purpose"],"source":"https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json","last_verified":"2026-06-15"}
