Context: Zstandard is a compression algorithm invented by Yann Collet, with the goal to exceed zlib's compression performance along every dimension (those being compression ratio, compression speed, and decompression speed). Although that bar was cleared long ago, we've continued to work to improve Zstd, and with this release, we've made Zstd just that little bit more hardened/better/faster/stronger.
As a sorta-kinda testimonial, Concourse just released a version[0] that switches its volume streaming compression from gzip to zstd. It consumed so much less CPU that they were also able to introduce parallel streaming.
On a test workload which slings around many large volumes these changes made it approximately 20x faster[1].
Thanks a lot - zstd is reaaaally fast for the compression ratios that it achieves.
I personally use it all the time in Clickhouse tables (the "Yandex Clickhouse" database). I admit that I'm still using "xz" when I focus on hardcore max compression (max compression within an "acceptable" timeframe) when doing specific tests with their focus on max final compression.
Yes. The 1.0.0 release froze the wire format (almost, see below). All Zstandard releases after that are all interoperable.
It’s funny though that you should ask that question now. Just yesterday I did submit an erratum on the RFC [0], changing a tiny detail of the spec. While it is literally, a breaking change, (1) we have not yet changed Zstd to produce outputs taking advantage of the change, and (2) it is not actually a concern, because all existing Zstd decoders ignore the existing spec in this regard.
On my system, the gzip binary is 100KB and my zstd binary is 1.0MB. There are a number of reasons for this:
- By default, the Zstd binary has a lot of additional functionality that gzip doesn't. For example: Zstd includes benchmarking capabilities, the ability to train compression dictionaries, support for legacy (pre-1.0) formats, etc. You can strip all that stuff out with compile flags, and the binary gets a lot smaller.
- Zstandard tries to cover a much wider range of compression speeds than zlib does. To do this, the Zstd compressor actually has eight or so different LZ match-finding implementations under the hood (which get inlined in various ways, resulting in >100 actual versions in the binary), and 5+ different entropy encoding implementations.
For really size-conscious use-cases (like decompressing Zstd content in a mobile app), you can get a minimal Zstd decompressor library down to ~30KB.
Zstd is awesome. I’d love to see desktop archive software include these algorithms. There’s a fork of 7-Zip called 7-Zip ZS that includes zstd and some other algorithms and I found they performed really well for my use case, proving fairly comparable compression to alternatives at a much, much faster speed. Handy for when you want to archive a massive directory of files or a small number of really large files.
Out of curiosity, why? Zstd performs as well as it does because of the enormous effort that has been invested into it. That necessarily brings complexity.
We do actually have a simple “educational” decoder implementation [0], but I wouldn’t recommend its use in production. And then there are two re-implementations in Java [1] and Go [2].