Hacker News new | past | comments | ask | show | jobs | submit login

>>> I never quite grasp how the above isn't just a matter of intuition to anyone working in the tech sector. Google Analytics thrives on developers' laziness in my opinion.

Ah, the "not invented here" syndrome!

There are tons of things that you could do "in a couple of weeks" that more or less work. However, it doesn't mean you have to or even that it would be a good idea.

If all developers adopted the attitude that you have expressed, there would be thousands of sad sad developers who need to maintain shitty in-house analytics system because someone once said "I could do it in a week". There are tons of awful CMSes already because someone once said "I could do better than wordpress" / "I could create a better framework" / etc.

In a lot of the cases, GA is just good enough. Sure, you might need to spend some time to explore its features (custom dimensions, etc), there's more to GA than a number of pageviews for a given day. There are cases when GA is not enough. Fair enough. But it's definitely not the majority of the cases.

Sure, it makes sense for SpiderOak given it's target audience. However, there's no need to make such a generic statement about 'anyone working in the tech'.




The answer is open source. Where is the wordpress to google's analytics product?


Piwik, Snowplow, etc. They do exist.

Then the question is do you really want to maintain the infrastructure required to run the analytics smoothly? Especially if your company has dozens of millions of pageviews a month and depends on the real time needs (extra infrastructure to support that).

Are you familiar enough with the stack so you could have a high degree of confidence that you can fix productions issues which are inevitable? Quite often, an honest answer here is 'no'. Then can you afford to lose a few hours/days/weeks (whatever it would take to fix the issue) of data? Again, often the answer here is 'no'.

Of course, you have hosted solutions. But they are no better than GA in terms of privacy.

Paid support exists too but the cost can skyrocket pretty quickly, on top of paying for the infrastructure and maintaining it.


Processing logs is a lot cheaper than the javascript download and other additional http requests needed for google analytics, not to mention the privacy costs. Cheaper for the website, the user, and the web in general.

Not to mention you get perfectly accurate analytics, with no loss due to request blockers or disabled javascript.

The code for this is generic. An open source solution costs nothing beyond some CPU to process the logs and a database to store the analytics.


It's been a while since I've used GA but being able to segment into age, gender, and interests(1) are things that you can't do without paying a marketing aggregator hundreds of thousands of dollars a month or using GA. You can do some geolocation classification and things like campaign effectiveness, bounce rate, etc, but since Google has so much aggregate data off-hand the value of being able to classify user-x as "Male, 40s, Interests-similar-to-demographic-we-sell-to"(2) is invaluable whether you're selling seats of enterprise software, high-fashion luxury items, or cheapo stoner knick-knacks. You can't really market segment with your own software.

(1) https://support.google.com/analytics/answer/3125360?hl=en&re... (2) https://support.google.com/analytics/answer/2819948


Sure. So now the question is, why would Google offer all of this for "free"? Is it really free? Who pays, and in what ways?


Obviously, they're using the same information that's helping you calibrate your campaigns to add to the hive-mind, so they can further data-mine. You're sacrificing the anonymity of your end-users in doing so. Obviously they're offering it so that they can refine their profile of you more accurately to sell ads / direct more relevant traffic to you better. I'm not an industrial engineer but I've been reading about it for the last few weeks. I turned off Adblock for a while and even with my Opt-out plugins(a,b) I started getting ads for $4,500 Fluke multimeters. The combination of one's search history plus a fairly comprehensive history of the sites you visit(b) to a terrifying degree, but at the same time, the average business with only a few million dollars a year going towards both sales and marketing can't really approach Quantcast and ask for access to their API.

a: https://tools.google.com/dlpage/gaoptout b:https://chrome.google.com/webstore/detail/do-not-track/ckdcp... b: I don't have the study off-hand, but IIRC some guy after finishing his masters from Stanford wanted to assess how much information Google had re: an average users browser history. The findings, based off Common Crawl data of the top 100k sites + presence of GA.js yielded something like ~> 75% of the web was tracked (not to be confused with how much of an end-user's traffic is tracked, that number will be far higher) based on sites with a GA.js history factoring in Referer tags. Those were unweighed numbers, i.e., I bet more than one out of two 45 year old woman's traffic can be analyzed to a 95% degree of completely entirely based off of Pinterest, Facebook, search history and the outbound links from her e-mail.


Interesting points. I think there are many ways to use Google Analytics that go beyond what many people want from "visitor data". [Some of t]he kind of questions GA can answer is only possible if one is willing to collude in destroying (meaningful) privacy.

I've had "simple foss analytics" on my todo-list for quite some time. I'm hoping one can build on what piwik have collected wrt bot agent strings, ips etc - and combine with a simpler collector (adding php to the stack just for analytics isn't very appealing, never mind a php codebase of somewhat questionable quality).

Snowplow looks good, but I'm not sure if they have a supported "self-host" stack yet (they started out very awz/s3 centric).

I actually think there's room for a new product, that puts a little bit more thought into what questions it makes sense to ask, and how best to answer them (eg: does collecting metrics on every visitor even make sense if you can answer the same wuestions just as well by doing random sampling? You might want to quantify where your bandwith goes - but simple log analysis might do that easily enough - and it might have very little to do with your human visitors etc).


If you make decisions with money riding on the answers, it costs a lot more than CPU and DB.

Perhaps systems administration is somehow very cheap for you, but I'm willing to bet it is still not "nothing" - even if the cost is you personally not watching a TV show you like because you're patching the web server on your analytics box for your personal vanity domain, that's still a cost.

For most operations, sysadmins are somewhat expensive, and because of that, busy. This is why Urchin was such a good idea, and why Google bought them - the proposition is to trade your users' privacy for the admin time it takes to support another internal app. There's an absolute no-brainer, assuming you don't care about your users' privacy (IIRC, they were going to sell the service before Google ate them, but that's ancient and trivial history).


>because you're patching the web server on your analytics box

If you're business is so small that an additional low-volume web server just to display your analytics (you don't need one for the actual tracking) is a big deal, then the same web server that serves your product can serve your analytics. Not a big deal.


I'm glad we both agree that analytics for a vanity domain is not a big deal. It also was a bounding example for my argument, not my argument.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: