Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Statcounter's "stats" are garbage, and should not be reported on. They're computed from untrustworthy information generated from a skewed and undisclosed sample, and processed with an unknown methodology. All that's published is aggregate data that's so coarse that it's impossible to actually reason about what's happening and what's driving the changes to the number.

But fairly regularly their stats are either so volatile or so absurd that it's obvious they have no relationship with reality. Like when they reporter Windows 8.1 climbing from 0.1% to 6% market share in the US in late 2023.

One could easily come up with half a dozen other explanations for this Linux desktop market share number that are as plausible as the hypothesis of significant growth in desktop Linux usage.



Though I agree that Statcounter's stats are garbage, their methodology is somewhat known. They have connections to supposedly 3 million websites that run their script, that script records each hit, and the end stats say 4.45% of desktop hits come from Linux.

It's unclear what sites they, but I doubt it's a representative sample. Even if it is, like one person figuring out which site was tracked setting up a refresh script could be enough to meaningfully damage the data.

https://en.wikipedia.org/wiki/StatCounter


We know the methodology by which they collect the data, which is why we can tell it's a skewed sample. We don't know how they process it.

It's not possible to reliably determine the operating system from just the user-agent. You could try to enrich the UA data with other signals, but all of those avenues are either being closed off by the browsers as fingerprinting vectors or are going to have trouble distinguishing Linux and Android.

Likewise we know they do some filtering of bot traffic, but not the details of how or even the proportion of traffic they're filtering out (which would at least allow us to reason about the quality of that filtering).


>We don't know how they process it.

I don't think they do. 4.45% of hits from UA's set to desktop report Linux. There's no more processing done. That'd explain why some random countries end up reporting a huge number of Linux users or the seemingly random spikes and drops the data shows.

>Likewise we know they do some filtering of bot traffic,

Do we? Could it not be ignoring certain UA's that report as bots, like cURL or google crawler?

You seem to think they must be doing something to justify their data, but as we don't know what they're doing the results are trash.

I think they don't do anything to justify their trash data. I've seen no reason to give them any credit.


How many people actually change their operating system user agent though—especially to linux? it’s gotta be less than 1%


It's not that normal users are doing it in significant numbers. It's that other stakeholders are.

For example browser makers do that. Safari on iPads claims in the UA to be OS X on Intel, not iOS on ARM.

Another common use is for bots that are trying to hide they are bots. That might be malicious scrapers, non-malicious scrapers probing for whether changing the UA also changes the page content, or benign bots like software running phishing checks against links sent in chats or emails. It's not enough for these bots to just hide who they are, they want to look like a real browser.


I'm one of thosr who have, Microsoft office webapps love to misbehave with Linux useragents! I know Linux friends who spoof useragents for privacy reasons as well!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: