Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Where to Store Logs?
9 points by HigherConscious on Oct 21, 2023 | hide | past | favorite | 11 comments
Say you have a web application and want to store events like page views, clicks, etc. for analytics.

Where should this data be stored? Is it considered acceptable for the web server to just INSERT every event directly into a SQL database table? If so, then at what volume of throughput does that break, and how should one handle higher scale?

Let's say that this is for a website where users can generate content (eg. Youtube) and view detailed analytics on that content.




The big problem with stuffing logs in SQL is that a log search can bring down your app. You'll be tempted to implement log search via something like SELECT * FROM logs WHERE message LIKE "%query%" and your DB will fall over when the log table gets big enough.

It's common to ingest logs into something like elasticsearch, for performance and reliability reasons.

This is a common enough problem that MongoDB Atlas has a feature that exposes searchable data through some lucene-based backend.[0] Never used it but found the concept to be interesting because it fits the convenient working pattern of "shove it all in the DB and figure it out later."

0: https://www.mongodb.com/atlas/search


As other commenters noticed already, it is a matter of an isolation more than choosing the technology. And even if focusing on the former, the technology should be chosen wisely.

Typical SQL engine is fine with the described traffic. To have more control over the usage of resources, it's good to have such logs and any other analytics in a separate database than your app's transactions. But also I saw quite big deployments where everything was in the same database, and with the right indexes, transaction isolation and some hygiene of writing the queries it was just fine - plus the code was simplified a lot.

Elasticsearch is the solution that has a lot of marketing and for sure has a nice and popular query language. Still, it is a memory hog order(s) of magniture more than your average favourite RDBMS. Eating more memory hurts performance and is expensive (memory is a single most expensive factor of a price on any cloud provider). So, it is good to ask a question, "do I really need it"?

Splunk is cool and fast. It's not free though, and last time I checked it has a bit complicated pricing.

I don't have much clear opinion on MongoDB as it was changing its performance characteristics way too often over the last two decades.

Aerospike is one of my favourite NoSQL engines. Offers great speeds and scalability. Its usage for analytics would be unorthodox, but valid. I recommend to give it a shot.


> The big problem with stuffing logs in SQL is that a log search can bring down your app.

Logs can be stored in a different db than the one the app is using.


my current preference are as follows

postgresql for transactional logs

clickhouse for analytics data

elasticsearch or quickwit for terabytes of data, disk persisted, if i need thorough search on structured jsons

---

others i use for different use case

typesense for searching mbs to gbs of data, memory persisted

redis for caching kbs of data, memory persisted


That's a good list.


Don't insert the logs/events/analytics into your Application DB. Usually, you send those to specialist datastores (OLAP etc) that process such high volume of data. This way, you keep the load and storage on your App db low AND if the analytics data is not working, it doesn't impact your Core Application.

You can use something like clickhouse [0] for example or use 3rd party SAAS solutions like posthog [1] etc that are built on top of clickhouse

[0] https://clickhouse.com

[1] https://posthog.com


Would 100% recommend Posthog, it's open-source and a great product. "Classic" events (like pageviews) are tracked and out-of-box dashboards with cohorts, DAU/MAU charts ready to be used & shared.

Used it in my former company and been using it ever since (not affiliated in anyway with them btw)


I'll have to look into Clickhouse, thanks.

What are the main alternatives to Clickhouse, and how do they fare?



I was contemplating on various ways to achieve this at my job for last few days, here's something worth considering.

[0]https://clickhouse.com/blog/analyzing-aws-fow-logs-using-cli...


Highly recommend a managed service like datadog or New Relic. Or if you in the cloud like AWS you can use cloud watch. Don't use your application db to store operational data you should seperate them out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: