Indeed. I think you have different saturation points the wider the use cases you...

user5994461 · on Dec 12, 2020

>>> the trouble becomes on how you handle decompression+checksumming+compression

gzip will cap 1 MB/s with the strongest compression setting and 50 MB/s with the fastest setting, which is really slow.

The first step to improve kafka is for kafka to adopt zstd compression.

Another thing that really hurts is SSL. Desktop CPU with AES instructions can push 1 GB/s so it's not too bad, but that may not the the CPU you have or the default algorithm used by the software.

agallego · on Dec 12, 2020

Kafka has `zstd` encoding.

Here is our version of the streaming decoder i wrote a while ago https://github.com/vectorizedio/redpanda/blob/dev/src/v/comp...

that's our default for our internal RPC as well.

in fact kafka protocol support lz4, zstd, snappy, gzip all of them. and you can change them per batch. compression is good w/ kafka.

loeg · on Dec 12, 2020

lz4 is a good option for really high-performance compression as well. (Zstd is my general recommmendation, and both beat the pants off of gzip, but for very high throughput applications lz4 still beats zstd. Both are designs from Yann Collet.)

agallego · on Dec 12, 2020

indeed. though, the recent zstd changes w/ different levels of compression sort of close the gap in perf that lz4 had over zstd. (if interested in this kind of detail for a new streaming storage engiene, i gave a talk last week at the facebook performance summit - https://twitter.com/perfsummit1/status/1337603028677902336)