TFA states: > we knew we were not going to use MongoDB sharding because it is co...

PeterCorless · on Aug 24, 2021

At the time MongoDB's sharding story wasn't great. They've gotten better since, but still have a primary-replica set model that has a single point of failure/failover. Cassandra (and Scylla) are leaderless, peer-to-peer clustering. Any node can go offline and the cluster keeps humming. Cassandra shards per node. Scylla goes beyond that and shards per core.

Cassandra and Scylla also use hinted handoffs so if a node is unavailable temporarily (up to a few hours) you can store "hints" for it when it comes back online. Handy for short admin windows.

trashcan · on Aug 24, 2021

MongoDB has the equivalent of hinted handoffs. Changes are streamed to secondary nodes via the oplog, and the secondary just resumes where it was once it is back online. There is a limit to how long it can be offline (based on the size of the oplog), but that is the same limitation as hinted handoffs.

PeterCorless · on Aug 24, 2021

Thanks! Good to know.

calmoo · on Aug 24, 2021

A MongoDB shard isn't necessarily a single-point-of-failure since a shard is usually deployed as a replica set. If a shard's primary node goes down, a secondary node in the replica set is elected as a primary and takes reads + writes. Similar to what you mentioned for Scylla - a node can go offline on a shard in a MongoDB cluster and it keeps humming.

spmurrayzzz · on Aug 24, 2021

Its hard to say because they're not explicit about this but, despite being a decade-long Mongo apologist myself, I'd totally believe that they liked the linear scale story for Cassandra more from an infrastructure/config perspective.

Increasing top-end write throughput or replication in Cassandra is just adding more nodes, where in Mongo its not just adding nodes, its adding replica sets (which consist of 3 or more nodes). So there's a few more layers of complexity to that story. You need more replica sets to increase write throughput and need more nodes in replica sets to increase replication.

Im hand waving some details here, but I've worked with both platforms can definitely understand the choice at least from a pure infra lens.