Scylla also uses a shared-nothing sharding approach, they have an open-source "asynchronous I/O engine" toolkit that is used inside of ScyllaDB called Seastar
They have some good reading on this approach in the Seastar docs if you're curious, it's a great general introduction to Shared Nothing:
This is usually generalized to some variation of "Physical-OS-thread-per-CPU-core" architecture, where each CPU core gets it's own dedicated chunk of data. You use hashing or some other mechanism to route requests for data to the right core.
Each CPU core can have user-land concurrency or "virtual threads" (fibers/coroutines/etc) as well, so you're not limited to sequential code necessarily.
A combination of thread-per-core x userland-coroutines-per-thread is how modern query processing approaches like "Morsel-Driven Parallelism" work.
I use colon separated syntax to indicate the multiplexing of different hierarchy levels of thread/coroutine/lightweight threads.
1:M:N means 1 scheduler thread, M kernel threads and N lightweight threads. So 8 lightweight threads multiplexed over 2 kernel threads multiplexed by one scheduler thread looks as this:
1:0:0
1:0:1
1:0:2
1:0:3
1:1:0
1:1:1
1:1:2
1:1:3
Can use ringbuffers to route data between threads, there's a multiproducer multiconsumer ringbuffers available. I also wrote what I call token ring parallelism where there is no locks but threads take turn writing but read as fast as they can.
Essentially there is a state machine that each thread is in reading mode or writing mode. Each thread enables the writing mode of the next thread THREAD_ID + 1 % threads.size(). When the last thread is reached, writing mode is enabled and then when that finishes, reading mode for all threads is reenabled.
Need some approach to representing data flow between threads that is efficient and is pull/push driven.
Scylla is a C++ rewrite of cassandra, the threading has been better as well with thread-per-core. Cassandra 4.0 has some thread-per-core rearchitecture so it remains to be seen if Scylla has performance advantages it used to tout.
I have not checked if Scylla ever caught up to Cassandra 3.0's feature set, last I checked it had not. It may be "going its own way" on features as well. I believe (they can correct me).
My personal recommendation would be to start with Cassandra, and when operations allow it, move to scylla if performance testing shows it is better. There is also exotic stuff like Rocksandra. Cassandra and Scylla are fundamentally similar in operations (add/remove nodes, scaling, etc). The scylla folks will contend the operations are easier since they get more perf per node so you need less nodes.
It's a good idea with cassandra / scylla to bootstrap with a performance testing, data loading, provisioning loop. It gets you used to operations and the tooling for backups, loads, and load testing and metrics.
They have some good reading on this approach in the Seastar docs if you're curious, it's a great general introduction to Shared Nothing:
https://seastar.io/shared-nothing/#:~:text=The%20Seastar%20M....
https://www.linuxfoundation.org/webinars/under-the-hood-of-a...
This is usually generalized to some variation of "Physical-OS-thread-per-CPU-core" architecture, where each CPU core gets it's own dedicated chunk of data. You use hashing or some other mechanism to route requests for data to the right core.
Each CPU core can have user-land concurrency or "virtual threads" (fibers/coroutines/etc) as well, so you're not limited to sequential code necessarily.
A combination of thread-per-core x userland-coroutines-per-thread is how modern query processing approaches like "Morsel-Driven Parallelism" work.
https://db.in.tum.de/~leis/papers/morsels.pdf