To clarify: GoogleBot, GPTBot, ClaudeBot, AppleBot, and PerplexityBot, etc. together account for 39% of all internet traffic. For every three human visitors, there are two bots. The truth is that GoogleBot and BingBot still account for the majority of crawling requests, but with the increasing number of AI crawlers, the risk of false data in tracking tools is also rising. With the landscape developing as dynamically as it is at present and the speed at which new crawlers are being released onto the internet, it is difficult for tool providers to keep up. Google Analytics uses its own research data and a list from IAB for filtering. Matomo excludes user agents that do not have JavaScript enabled by default. Admins can also exclude individual user agents.
LUX also maintains a list of bots that should be excluded from tracking. Users can expand or customize this list for their own projects as they see fit. This is a major advantage of open source tracking tools, as the source code can be transparently modified and there is a strong community behind the projects that reports new bots. PostHog also maintains a list of excluded bots, and the hard-working community reports new bots, which are then added to the list. Of course, it cannot be ruled out that data may be distorted by crawlers, but responding quickly to such erroneous data is what distinguishes good software. The image below shows the data falsification caused by a weekly crawler from Sistrix, which of course extremely distorts the number of pages visited. However, it is just as important to record whether crawlers can visit the website or whether GoogleBot and Co. encounter obstacles when exploring the information, thereby preventing visibility on Google, ChatGPT, and Bing.