Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TFA states:

> we knew we were not going to use MongoDB sharding because it is complicated to use and not known for stability

But then goes on to describe using Cassandra and overcoming sharding and stability issues. I.e., changing the key, changing TTL knobs, adding anti-entropy sweepers, and considering switching to a different cassandra impl entirely.

Are these issues significantly harder to solve in MongoDB than Cassandra?



At the time MongoDB's sharding story wasn't great. They've gotten better since, but still have a primary-replica set model that has a single point of failure/failover. Cassandra (and Scylla) are leaderless, peer-to-peer clustering. Any node can go offline and the cluster keeps humming. Cassandra shards per node. Scylla goes beyond that and shards per core.

Cassandra and Scylla also use hinted handoffs so if a node is unavailable temporarily (up to a few hours) you can store "hints" for it when it comes back online. Handy for short admin windows.


MongoDB has the equivalent of hinted handoffs. Changes are streamed to secondary nodes via the oplog, and the secondary just resumes where it was once it is back online. There is a limit to how long it can be offline (based on the size of the oplog), but that is the same limitation as hinted handoffs.


Thanks! Good to know.


A MongoDB shard isn't necessarily a single-point-of-failure since a shard is usually deployed as a replica set. If a shard's primary node goes down, a secondary node in the replica set is elected as a primary and takes reads + writes. Similar to what you mentioned for Scylla - a node can go offline on a shard in a MongoDB cluster and it keeps humming.


Its hard to say because they're not explicit about this but, despite being a decade-long Mongo apologist myself, I'd totally believe that they liked the linear scale story for Cassandra more from an infrastructure/config perspective.

Increasing top-end write throughput or replication in Cassandra is just adding more nodes, where in Mongo its not just adding nodes, its adding replica sets (which consist of 3 or more nodes). So there's a few more layers of complexity to that story. You need more replica sets to increase write throughput and need more nodes in replica sets to increase replication.

Im hand waving some details here, but I've worked with both platforms can definitely understand the choice at least from a pure infra lens.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: