Blocking AI scrapers made my analytics worse

Published January 06, 2026 · 4 min read

While starting my blog in early 2024, I read posts from other bloggers about AI scraping traffic. I decided back then that I wanted indexing from search engines but no traffic from AI scrapers.

Why I blocked them

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

I had several reasons to block AI traffic. First off: As I write my blog myself, I wanted it to be read by humans. This was in early days of AI becoming relatively useful, but not accepted everywhere. The internet was changing because the openness of these blogs were becoming swallowed by LLMs with the risk of going closed behind big corporations with only money as a moral.

Second reason was because AI traffic on other blogs were taking over the traffic. This made hosting more expensive and uptime more difficult to achieve with more traffic and spikes. This was not a big concern for me, as I use GitLab pages to host my blog, but there is still a wish to self host it at some point.

So in February 2024 I started blocking AI scraping companies. I used the list from https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt for a long time.

What changed

Before February 2024 I had very little blog visits, and just 2 posts. On those posts, the visits were regular, normal OS/browser variety. Just before my third post I started denying AI user-agents in the robots file. My third post I got more visits. At some point I looked at my metrics (anonymous data) and saw a lot of traffic from Windows 7 devices targeting direct pages that are only linked from the sitemap (no links from the menu or any HTML file). That got me suspicious.

Most of the traffic nowadays doesn't follow a normal session, it's just a single page visit and then nothing. Mostly targeting tags overview pages and 45% shows a user-agent of Windows 7 which Microsoft ended support for in 2020

I suspect AI companies using fake User-agents to ignore the robots file and scrape my blog anyway. This made my analytics essentially useless. I couldn't tell who was actually reading my blog. Real people with real browsers, or scrapers pretending to be Windows 7 users.

Now what

After almost two years of blocking AI scrapers, I'm removing those restrictions from my robots.txt file today. I could stand my ground on blocking these requests, but they will find a way to do what they do. AI is part of people's way to get knowledge (although it's still just a prediction engine), I might as well accept the fact of web-scrapers and let them use their own user agents.

This way I hope any knowledge, thoughts and ideas shared in this blog might reach more people.

After allowing them in I hope to see them in my statistics as who they are, making my analytics more accurate and useful.

Final thoughts

Robots.txt file is a standard, but not enforced. So basically scrapers could still use their original user agent and nothing would change. It does however hint that they are not welcome. Maybe at some point regulation from the EU will enforce this for everyone, but for now, as an individual blogger, this battle isn't worth fighting alone.

I am accepting that AI is now part of the toolbox for people. I hope that the companies are transparent again on who they are and that my analytics become more real again.


Share

Share on Mastodon - Share on LinkedIn - Share on Facebook