Scylla also uses a shared-nothing sharding approach, they have an open-source "a...

samsquire · on Jan 3, 2023

Thanks for this comment Gavin.

I use colon separated syntax to indicate the multiplexing of different hierarchy levels of thread/coroutine/lightweight threads.

1:M:N means 1 scheduler thread, M kernel threads and N lightweight threads. So 8 lightweight threads multiplexed over 2 kernel threads multiplexed by one scheduler thread looks as this:

  1:0:0
  1:0:1
  1:0:2
  1:0:3
  1:1:0
  1:1:1
  1:1:2
  1:1:3

Can use ringbuffers to route data between threads, there's a multiproducer multiconsumer ringbuffers available. I also wrote what I call token ring parallelism where there is no locks but threads take turn writing but read as fast as they can.

My token ring parallelism can be found in this sourcefile https://github.com/samsquire/multiversion-concurrency-contro...

Essentially there is a state machine that each thread is in reading mode or writing mode. Each thread enables the writing mode of the next thread THREAD_ID + 1 % threads.size(). When the last thread is reached, writing mode is enabled and then when that finishes, reading mode for all threads is reenabled.

Need some approach to representing data flow between threads that is efficient and is pull/push driven.

AtlasBarfed · on Jan 3, 2023

Scylla is a C++ rewrite of cassandra, the threading has been better as well with thread-per-core. Cassandra 4.0 has some thread-per-core rearchitecture so it remains to be seen if Scylla has performance advantages it used to tout.

I have not checked if Scylla ever caught up to Cassandra 3.0's feature set, last I checked it had not. It may be "going its own way" on features as well. I believe (they can correct me).

My personal recommendation would be to start with Cassandra, and when operations allow it, move to scylla if performance testing shows it is better. There is also exotic stuff like Rocksandra. Cassandra and Scylla are fundamentally similar in operations (add/remove nodes, scaling, etc). The scylla folks will contend the operations are easier since they get more perf per node so you need less nodes.

It's a good idea with cassandra / scylla to bootstrap with a performance testing, data loading, provisioning loop. It gets you used to operations and the tooling for backups, loads, and load testing and metrics.