Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They did build a self hosted alternative based on grafana, Loki, Prometheus

Had a whole team of 10+ engineers working on it for 2 quarters, then scrapped it because it performed terribly

The only thing that came of it was negotiation leverage with datadog ("give us X% off or we go self hosted")



What kind of scale of logs are we talking here? The company I work for run a self-hosted Grafana LGTM stack ingesting about 1TB of logs per day, it’s pretty snappy and works well enough, and only costs a few thousand dollars per month in GKE costs for the entire observability stack.

How much logging are we talking here?


GitHub has over 21TB of source code. Applications consistently pour through this data and emit logs and events. 1TB of data by breakfast maybe? In reality, we're not pushing logs to datadog, just metrics and event tags. Our level of cardinality, however, requires a lot of horsepower on the backend. Our attempted Prometheus transition was just not cost effective when attempting to view large sets of data over a large-ish period of time. Combined with the heavy lift of integration (we depended heavily on dogstatsd) it just didn't seem efficient to move to Prometheus, support the infrastructure required, all while migrating to microsoft's inhouse product.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: