Developer Adwait Bokade published a detailed investigation this month revealing a blind spot in how website operators measure traffic: AI crawlers from OpenAI and Anthropic are silently indexing the web while remaining invisible to conventional analytics platforms. Bokade noticed unexplained server-side activity on a personal site receiving only 10 to 20 human visitors per week, traced it to GPTBot and ClaudeBot, and then attached a fresh domain to a custom server-side detection tool to quantify the problem. In just 72 hours, the tool recorded 1,011 bot and crawler requests — a volume that shows how broadly the crawler ecosystem targets even obscure web properties. Because these bots download raw HTML without executing JavaScript, the tracking scripts that power Google Analytics, PostHog, and similar tools never fire, and site owners can't see most of their traffic.

Bokade's writeup outlines three detection tiers: client-side JavaScript-dependent tracking, which misses all non-JS crawlers entirely; server-side inspection of User-Agent headers and IP addresses at request time; and network-layer detection via CDN or hosting provider firewall rules. Server-side inspection catches roughly 85 to 90 percent of bots, Bokade writes. GPTBot and ClaudeBot both self-identify through standardized User-Agent strings and originate from documented IP ranges, making them detectable server-side despite their JavaScript avoidance. Bokade characterized their crawl behavior as comparatively conservative — neither bot hammered the same pages repeatedly, unlike generic scrapers. The tool he built is deployed on Vercel and logs crawler activity through a lightweight installation script, with a dedicated crawler tab surfacing bot traffic that standard dashboards miss entirely.

The more troubling finding involved Grok, xAI's crawler, which Bokade observed spoofing an iPhone browser User-Agent string rather than disclosing its identity as a bot. He identified the true origin by cross-referencing the source IP against ASN data via ip-api.com, a workaround that requires technical effort most site operators will not undertake. That distinction matters beyond analytics: the emerging framework for AI training data opt-outs — robots.txt enforcement, EU Copyright Directive rights reservations under Article 4(3), and GDPR compliance — depends entirely on crawlers correctly disclosing who they are. Without accurate self-identification, <a href="/news/2026-03-15-privai-privacy-claims-scrutinized">site operators have no practical way to exercise those rights</a> against a crawler pretending to be an iPhone. The EU AI Act moves toward full applicability in August 2026, with transparency requirements for general-purpose AI model providers — requirements whose enforcement depends on the same self-disclosure that Grok appears to be avoiding.