Hacker News new | past | comments | ask | show | jobs | submit login

My project as well, sharded over 4 nodes, ingesting about a TB every two weeks. Automatic deletion of old data was too slow so we had to work with a scheme that allowed us to drop daily collections, but beside that it ran great. This was 6 to 3 years ago though. Maybe there's a better log stash out there now? I haven't seen one yet.



ElasticSearch does most of what you are saying in the box, can even have each day read into a different collection and query them on joined aliases. At least IIRC, I'm not an expert. When I first tried using it and Mongo, I had issues with ES geo indexing, but since then have used both with little issue. Just depends on the scenario.


Latest ES supports purging expiry based on a time field I believe, but before that you were creating daily/hourly indexes and haggling with elastic curator to delete them. There is also no meaningful way to do it by logical size.

Elastic is great in it's own sense, but it's not as flexible as mongo when it comes to schemas.


Agreed... I didn't know it allowed for time expiry now, been a couple years. I just find that at least half the time where I would consider Mongo today, ES seems to be a better fit. If PG was nearly as easy to setup replication hot/auto fail over, I'd probably favor that 95% of the time. I really wish RethinkDB had been as successful with marketing as Mongo though.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: