Fwiw the benchmarked numbers are for writing very small rows. When doing the messages migration, with no read traffic, and the cluster/compaction settings tuned for writes we only managed approx 3m inserts/sec while fully saturating the Scylla cluster.
Interesting, we've got to 5M+ reads/sec in realistic simulated benchmarks and ~2M reads/sec of real-world-throughput on our clusters that are <10 nodes (though really high density). I don't think I've pushed writes beyond 1M QPS in real-world or simulated loads yet though. Thankfully our partitioning schemes are super well distributed though and our rows are very small (generally 1-5k) so I don't think we'd have a problem hitting some big numbers.
How about per-node memory pressure, did it change in favor of Scylla? I ask because I would legitimately expect that GC-based system would have a larger pressure on the memory subsystem.
Scylla just eats all the ram it can with cache. So it's hard to say really. On Cassandra we allocated half the ram to the JVM which it gladly used up and left the other half to the OS for disk cache. On Scylla, since it uses direct io, there is no need for OS disk cache.