Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know much about it other than what I've learned through the struggle of trying to set up a cluster.

But why do so many projects depend on Zookeeper? What does it provide that couldn't be done through a embedded library? Seems like a lot of databases don't really need it. Is it worth the extra network dependency and operational complexity?



Your question could be rephrased: why do so many projects depend on an external store for distributed consensus?

One answer is that coupling the consensus part of the system with parts that do active work results in harmful resource conflicts. Those resource conflicts can cause consensus algorithms to fail or take much longer to return answers. Example: Java VM clocks misbehave if the process can't get enough CPU. This can cause systems like ZK to lose quorum.


> But why do so many projects depend on Zookeeper?

Hadoop ecosystem legacy. Most companies adopting tech like Druid were already running Hadoop and had Zookeeper as a result. Probably made sense to take advantage of a reliable, or at least well-known, system.


Have an upvote.


A lot of tools (especially in Hadoop) were doing the same things. So the idea is to share all that logic in its own dedicated service. Like so many other things in software the tradeoffs had unforeseen consequences. What is an optimization for that community became baggage to the newcomers. I think HashiCorp is one of the worst offenders of this thought (even though I love them and use several of their tools)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: