Zstandard v1.4.1

felixhandte · on July 19, 2019

Context: Zstandard is a compression algorithm invented by Yann Collet, with the goal to exceed zlib's compression performance along every dimension (those being compression ratio, compression speed, and decompression speed). Although that bar was cleared long ago, we've continued to work to improve Zstd, and with this release, we've made Zstd just that little bit more hardened/better/faster/stronger.

jacques_chester · on July 19, 2019

As a sorta-kinda testimonial, Concourse just released a version[0] that switches its volume streaming compression from gzip to zstd. It consumed so much less CPU that they were also able to introduce parallel streaming.

On a test workload which slings around many large volumes these changes made it approximately 20x faster[1].

[0] https://github.com/concourse/concourse/releases/tag/v5.4.0

[1] https://github.com/concourse/concourse/issues/3992#issuecomm...

zepearl · on July 19, 2019

Thanks a lot - zstd is reaaaally fast for the compression ratios that it achieves.

I personally use it all the time in Clickhouse tables (the "Yandex Clickhouse" database). I admit that I'm still using "xz" when I focus on hardcore max compression (max compression within an "acceptable" timeframe) when doing specific tests with their focus on max final compression.

TAForObvReasons · on July 19, 2019

Is RFC 8478 still compatible with the newest library releases?

felixhandte · on July 20, 2019

Yes. The 1.0.0 release froze the wire format (almost, see below). All Zstandard releases after that are all interoperable.

It’s funny though that you should ask that question now. Just yesterday I did submit an erratum on the RFC [0], changing a tiny detail of the spec. While it is literally, a breaking change, (1) we have not yet changed Zstd to produce outputs taking advantage of the change, and (2) it is not actually a concern, because all existing Zstd decoders ignore the existing spec in this regard.

[0] https://www.rfc-editor.org/errata/eid5786

edflsafoiewq · on July 19, 2019

How does it compare in binary size?

felixhandte · on July 19, 2019

On my system, the gzip binary is 100KB and my zstd binary is 1.0MB. There are a number of reasons for this:

- By default, the Zstd binary has a lot of additional functionality that gzip doesn't. For example: Zstd includes benchmarking capabilities, the ability to train compression dictionaries, support for legacy (pre-1.0) formats, etc. You can strip all that stuff out with compile flags, and the binary gets a lot smaller.

- Zstandard tries to cover a much wider range of compression speeds than zlib does. To do this, the Zstd compressor actually has eight or so different LZ match-finding implementations under the hood (which get inlined in various ways, resulting in >100 actual versions in the binary), and 5+ different entropy encoding implementations.

For really size-conscious use-cases (like decompressing Zstd content in a mobile app), you can get a minimal Zstd decompressor library down to ~30KB.

meruru · on July 19, 2019

Great work.

retrobox · on July 19, 2019

Zstd is awesome. I’d love to see desktop archive software include these algorithms. There’s a fork of 7-Zip called 7-Zip ZS that includes zstd and some other algorithms and I found they performed really well for my use case, proving fairly comparable compression to alternatives at a much, much faster speed. Handy for when you want to archive a massive directory of files or a small number of really large files.

pvg · on July 19, 2019

Previouslies: https://hn.algolia.com/?query=Zstandard&sort=byPopularity&da...

nwmcsween · on July 20, 2019

I really wish there was a small simple implementation of zstd

felixhandte · on July 20, 2019

Out of curiosity, why? Zstd performs as well as it does because of the enormous effort that has been invested into it. That necessarily brings complexity.

We do actually have a simple “educational” decoder implementation [0], but I wouldn’t recommend its use in production. And then there are two re-implementations in Java [1] and Go [2].

[0] https://github.com/facebook/zstd/tree/dev/doc/educational_de... [1] https://github.com/airlift/aircompressor/tree/master/src/mai... [2] https://github.com/klauspost/compress/tree/master/zstd#zstd

dlphn___xyz · on July 19, 2019

is decompression faster than gzip/zlib?

duckerude · on July 19, 2019

Funnily enough, zstd can also decompress gzip-compressed data faster than gzip can.

  $ du -h data.gz
  324M data.gz
  $ time -p unzstd -k data.gz
  real 1.43
  user 1.24
  sys 0.19
  $ time -p gunzip -k data.gz
  real 4.85
  user 2.28
  sys 0.19

It's even faster at decompressing its own format.

felixhandte · on July 19, 2019

Yes! Three or more times faster.