> Two years ago, Facebook open-sourced Zstandard v1.0...
Bullshit, Zstd was open-source from the very beginning, they just hired Yann and moved the project under facebook org. How do I know? I have written the JVM bindings [1] since v0.1 that are now used by Spark, Kafka, etc.
EDIT: Actually, my initial bindings were against v0.0.2 [2]
Kudos to FB for hiring him and helping Zstd getting production ready. This is just a PR false claim.
Given that it was Yann himself who wrote that sentence, I think that's a needlessly uncharitable interpretation. Maybe a better wording would have been "Two years ago, we released Zstandard v1.0, an open source ...". But I don't think we anticipated anyone would read that much into it.
Blog posts from companies are always reviewed by the marketing team, that frequently changes what the author initially wrote, or adds random stuff to make it sound more like a company effort.
Yes, may be I read it the wrong way. I later noticed the authors. BTW, thank you for your part at making it the best compression library for wide variaty of cases.
Previously the license was BSD (as my commit above shows). v1.0 moved it to BSD+GPL2+Patent clause that is not more open-source than just BSD. After concerns from the community the patent clause was dropped, a little bit after React.
My browser loaded that website with a header:
accept-encoding: gzip, deflate, br ("br" means Brotli by Google)
The response had a header:
content-encoding: gzip
Zstandard looks like an improvement of DEFLATE (= gzip = zlib) and its specification is only 3x longer (even though it is introduced 22 years later): https://tools.ietf.org/html/rfc8478
Since Zstandard is so simple and efficient, I thought it would get into browsers very quickly. Then, it could make sense to compress even PNG or JPG images, which are usually impossible to compress with DEFLATE.
With internet speed significantly above the decompression speeds zstd is favorable. With internet speeds below the decompression speeds brotli is favorable because less bytes need to be transmitted.
Usual users have internet speeds of a 10 MB/s or so, and brotli is more favorable up to around 200 MB/s (2 gbps) internet speeds. It is also not just about speed, but mobile users need to pay less with brotli as less bytes are transferred.
Further (I'm not an expert on this, somewhat speculative) but likely the streaming properties of brotli are slightly better, i.e., less bytes are needed to hide in a buffer to be able to decode bytes. This may allow the browser to issue new fetches for urls in an html document earlier with brotli than with zstd.
From what you say, it sounds like you think, that Brotli has a better compression ratio, than Zstdandard.
According to this chart, Zstandard has a better compression ratio, while compressing and decompressing faster than Brotli: https://facebook.github.io/zstd/
It is a cherry picked data point, and not representative for brotli. There, facebook shows numbers with quality 0. However, when they compress for the web, they use brotli with quality 5 for dynamic content and quality 9 (or 11?) for static content.
At higher quality settings brotli wins in compression density, and particularly so with quality 11 or with short files.
Quality 0 is an irrelevant corner case of maximal compression speed, and is completely impractical for most uses.
If you plot the whole curve, brotli's density is above that of zstd's for compression speed/density -- when both algos are used with same backward reference window size.
I'd really like to thank Cyan for their contributions. `zstd` and `lz4` are great. I'm pretty much exclusively using `zstd` for my tarball needs in the present day as it beats the pants off `gzip` and for plane text code (most of what I compress) it performs amazingly. (shameless self promotion) I wrote my own tar clone to make usage of it [1].
It is nice to have disk IO be the limiting factor on decompression even when you are using NVMe drives.
The best thing about zstd is its zlibWrapper, which lets you write code as if you’re consuming zlib-compressed files while transparently working with zlib-, zstd-, or uncompressed files. I build several of my tools with zstd for this reason.
> The snappy wrapper can serve as a frontend for bzip2, lz4, lz4hc,
lzma, lzo, and zlib.
> The zlib wrapper can serve as a frontend for bzip2, lz4, lz4hc,
lzma, lzo, and snappy.
Very handy for quickly benchmarking multiple compressors without having to write multiple implementations of the same test code. As illustrated here http://www.lmdb.tech/bench/inmem/compress/
Feel free to PR support for brotli or zstd or anything else that comes along.
zlibWrapper seems useful if you're specifically using the low level zlib APIs already and don't want to change your code very much. But if you just use zopen() or similar, or are willing to make minor changes, I don't see much benefit.
(Especially given the performance gap vs native zstd APIs).
I have seen some fopencookie(3)-based zstd/zlib/xz/etc FILE-object wrappers floating around that make it pretty easy to work with any compression library's streaming APIs.
zlib is standard in my field, so being compatible is a big plus. I should do some testing with the zstd API, though. Thanks for the performance discrepancy heads-up!
I used zstandard to compress mesages in P2P multiplayer game engine, and, taught on our real-life packets, it got us 2x-5x improvement. Awesome library, will use it in any similar project from now on.
For what it's worth from my not-particularly-scientific tests at work, they both compressed significantly better than gzip, but zstd compressed better and faster than brotli for our typical file sizes (text files around 5-20kb). Mind you, this was all at lower/faster compression levels; by default brotli compresses at the highest/slowest level, so perhaps it's been designed primarily for a different use case.
Unfortunately, we have a strict requirement that the decompression be done in pure Java (and ideally providing an InputStream interface). The official Brotli repo includes a pure Java decompression library that took about three lines of code to drop in next to our gzip support, while zstd had nothing suitable (a JNI wrapper is not an option, and the Java decompressor at [1] does not provide an streaming interface). Writing my own Java decompressor for zstd sounded like a fun project, but that would have been too much like yak shaving; brotli is good enough.
zstd vs. brotli -- there are loooots of badly conducted experiments. Even the zstd folks themselves fell into the trap of comparing brotli and zstd with different window sizes.
I never was able to locate a file that compresses better with zstd than with brotli, if you use the same window size in both.
The car analogy in using different window sizes is to drive on 1st gear with one car and drive on 5th gear with other, i.e., meaningless comparisons. One needs to use the same window size with both algos to be able to understand the performance potential in them. When one does that, brotli always wins in density and most of the time on compression speed to a given density. Zstd wins in decompression speed, but neither of them is slow (think 600 MB/s vs. 900 MB/s).
They compare pretty closely. The squash benchmark is a nice interactive comparison tool that usually agrees with my own benchmarks. See for example [1]. (Note that you should experiment with different input texts to get a sense for the variability in relative performance.)
I am working on that very question! Zstd's support for creating and using custom dictionaries opens the door to significant efficiencies. As described in the post though, dictionaries make compression a more complicated thing to use, and so there are lots of questions about how to apply that to the public internet in a way that's cross-compatible and secure. In short: it's something we're actively exploring.
Brotli added custom dictionaries in 2014 or 2015 (before zstd format was even frozen). We are currently working on a new more efficient custom dictionary that uses context and transforms to ramp up the impact of the custom dictionary (by about 20%).
Yep, that's in the vicinity of the solutions we're thinking about. There are a few proposals out there, and prior art like SDCH [1].
The hard part is that compression is already an attack vector for the web (e.g., CRIME [2], BREACH [3], et al.). We want to make sure that we're not eroding or unduly complicating that situation [4].
Both brotli and zstd support user-provided dictionaries, which helps with compressing files which contain items from the dictionary. This is particularly helpful for small files. Zstd is both faster and more effective at compressing.
Actually brotli compresses some more, often around 5 % more density. If you compress to same density, brotli tends to be significantly faster. Zstd decompresses faster.
Brotli is naturally limited to 16 MB of memory use whereas ZStd makes less attempts to reduce memory use. If you want to use more memory with Brotli, you need to use another flag '--large_window' instead of '--window', to indicate that you really want it, whereas zstd uses a lot of memory more silently.
When you compare compression levels, you should compare with same memory consumption -- and if you do that, brotli tends to win by 5 %.
When in doubt in benchmarking, run brotli with --large_window 30
cool, just that most zstd benchmarking against other algos that I ever saw was with -22, i.e., disregarding memory use and multi-processing performance
would be wonderful to see benchmarking with real data and with realistic (and same from one algorithm to the next) window sizes
Looking at the brotli API, I believe I spotted the source of our disagreements. Brotli separates out the window size from the quality, but in zstd the level implies the window size, unless you use advanced parameters. For example, level 3 has a 1 MB window size.
When you are comparing brotli and zstd, are you focusing on the highest quality/compression level, but varying the window size?
For us, lower compression levels (1-3) are our primary focus. We care about the highest levels for benchmarks, but it isn't our focus. When we are benchmarking zstd we focus on the lower levels, since they are the most important to us. When we think about a smaller window size, we generally think about faster compression.
When we comparing against brotli, the question we're asking is, how does brotli compare to level 3. When you're benchmarking zstd, I suspect you're asking the question, how does the highest zstd level compare to the highest brotli quality for the same window size, is that right?
In my thinking window size is something that is optimized for the decoding resources -- whereas the encoding effort is optimized for how expensive the transfer is in relation to encoding cost.
For example, a mobile client might prefer to have no more than 512 kB sliding window due to design of its memory system or for multi-processing reasons. There, a static resource can still be encoded to the smallest size with extremely slow encoding (quality 9-11) -- but for one-use we'd still prefer a faster encoding (quality 6 or so).
Thank you for explaining your focus. Did you arrive to that by doing economic calculations or was it to have minimal changes to situation with no compression?
A lot of compression ends up being compress once, decompress one to a few times, so the faster end of the spectrum fits well. Additionally, compression has to fit into the existing system, and existing systems already have tight constraints, so it is an easier sell to say "hey, we can compress faster AND stronger" than what you have right now. For our larger services, we will tune the compression level more carefully, and end up at different places for different services, but still normally around the faster levels.
That said, we still see users of the stronger compression levels.
Yes, plenty of use cases -- that's what makes compression so interesting!
In my back-of-the-envelope financial analysis, the highest economic impact is when I can reduce people's need to wait for data to arrive. There we often use relatively slow compression even when it is for one use only. Computers are quite a lot (~1000x) cheaper than people's time.
At the moment, Brotli doesn't accept a user-provided dictionary. I know they're working to re-introduce that functionality, but it's not currently present.
> And the zstd binary itself includes a dictionary trainer (zstd --train). Building a pipeline for handling compression dictionaries can therefore be reduced to being a matter of gluing these building blocks together in relatively straightforward ways.
What happens if your user data trained dictionary ends up storing user data and you receive a GPDR destruction request?
It would not be covered by GDPR, just like e.g. private emails containing PII about a third party are not in scope from perspective of the e-mail provider.
I find the first chart so hard to understand. The axes need labels, and the color scheme is not ideal. They should use different line styles, and add a caption below summarizing the findings. There's a reason journals often require graphs to be formatted this way.
The short answer is that zstd is faster and better at compressing than zlib for most input data.
For more details, here's a comment I made a couple years ago:
There's a nice comparison of compression algorithms (including zlib, zstd, brotli, snappy, etc) here: https://quixdb.github.io/squash-benchmark/ It's nice because it uses many datasets and machine platforms.
Unfortunately the graphs leave a bit to be desired, especially when you're trying to compare two algorithms. They provide the raw data, but it needs a little munging to make it workable.
For my day job I made the following graph to compare zlib level1 and zstd. This was using the "peltast" platform which is a Xeon based system, because that was most relevant for us.
The x axis is compression speed, and the y axis is compression ratio.
Zstandard outperforms zlib in compression ratio, compression speed, and decompression speed (not shown). The only reason to stick with zlib is for compatibility with systems that expect zlib.
That sounds fantastic. What is the porting process generally like? Are there any possibilities to create an API compatible wrapper to make it a drop in replacement for zlib?
* If you already have the compression algorithm tagged, through a file extension, or a field, then you can use that to dispatch to the right decompression algorithm.
* Zlib, gzip, xz, zstd, ... all have headers. If you are using zlib, and switching to zstd, you simply have to check the first 4 bytes for the zstd header using ZSTD_isFrame() [0], or attempting to decompress with zstd and if it fails fall back to the previous decompression algorithm.
* The zstd CLI can decompress both zstd and zlib/gzip if compiled with zlib support.
* Zstd provides a wrapper around the zlib API so you could transparently switch to zstd. [1]
On the blog you mention you are underway porting the internal code to replace Zlib with Zstd. Is there a reason you decided not to use the wrapper as a first pass to migrate all uses of Zlib to Zstd across the entire codebase?
Much faster and/or better compression/decompression of images (tunable of course, depending on what level you set in the code), and also faster startup of VMs. I haven't measured exactly. I'd guess ~30% in my case.
But all this depends on where the bottleneck is in any particular case. My VMs are on HDD, so increased decompression speed doesn't matter that much, but reduced size helps reading the necessary data faster. Linux VMs seem to be more compressible than Windows ones.
The informatics dysfunction on this graph is just off the charts.
Here's the problem. The graph is designed to make your conclusion sound right, but it doesn't actually prove that.
Let's look at what the numbers really say:
For the sample data, and the best case, gzip gets you a file that's about 31% of the original size. zstandard can get you a file that is 25% of the original size but it will take you four times as long to get it. If you allow it the same time as gzip, you can get 27% compression instead of 31%. That's only 13% improvement on the wire.
That's nice, but it's not impressive at all. It's not a good enough reason to change your stack. The only thing that is impressive is that if you want the same compression ratio as gzip you can do it up to 20 times faster. On some hardware that's totally worth it, but not on all (because who is streaming at 700 MBps?)
Hi. I made the graph. It was definitely not my intent to obscure or distort any information. Here's the raw data if you like [1]. Some context: the input text is silesia.tar, which is a standard mixed corpus of data. It was benchmarked by @Cyan4973 on his i7-9700k.
I guess I'm confused about how you feel we're misrepresenting the data? Are you talking about the sentence preceding that graph, "The benefits we’ve found typically range from a 30 percent better ratio to 3x better speed"? That's more describing the benefits we've seen in the real world, where for a variety of reasons (many of which are discussed in the rest of the post), we generally can get more out of zstd than this vanilla benchmark shows.
The first rule of objective graphs is always show the origin at 0. As an informed graph consumer, if the origin is not 0 you should immediately distrust your eyes, and question the motives of the person showing you the data. The relative sizes are being distorted. Why?
In this case, there's a 15% difference in Y values in your data that on the graph is presented as two lines that are separated from each other by 33%. Just by removing 0 and 1 from the chart.
On the other hand, that effect is diluted a bit by what you've done on the X axis. Log scale tells a story about trends. The relative slope of two lines on log scale says something. The distance between them doesn't mean much, if anything, although the brain can't help trying to make it mean something.
I believe you will find that a great deal of what you lose by plotting y = 0 you'll gain back by plotting x on a linear scale. Those lines deserve to be much farther apart.
The Visual Display of Quantitative Information is one of my favorite books!
0 in this case is not really a relevant value (since that would mean transforming the input into something infinitely large). The functional identity value / origin here is 1x. Here's what that looks like [1].
To me, this is a significantly less useful image. But maybe that stems from a lot of comfort with both the subject matter and the detail log-log plots that I work with to evaluate zstd performance, e.g. [2].
Hey thanks for the new chart, and you're right about 1x. But we might have to disagree because I don't think this graph is that bad. I can see a bunch of talking points in this graph that are about strategy instead of about why I hate bad graphs.
First, that data point for lz4 out around 820 MB/s kind of throws the groove off of that graph. Despite that, I would still suggest you prune the graph off at 850 or 900 to reduce the squish of the horizontal data (pruning the graph to put the last value at 100% of width or height isn't considered a no-no).
One of the things I can see now but couldn't before is that the size/speed tradeoff is pretty linear except for the giant dog-leg around 3.6:1, and a less pronounced but still notable one occurs at 2.9:1.
If I sat down with my team to discuss this chart I'd suggest we agree that we aren't interested in anything above 3.6:1. And then I'd suggest we look at everything above 2.9:1, but my eye is on 3.2:1 (where zlib tops out and is 1/10th the speed).
If they don't like 2.9:1, then the next interesting point is at about 2.7:1 when zlib bottoms out. We're already at such a high bandwidth rate that something else is probably going to be the bottleneck.
Having spent the last few years in the 'real' (ie. non-software) world, it boggles me how anything less than an order of magnitude improvement is handwaved as trivial. 13% of bandwidth or 20% of offline storage space is huge savings for a company like FB. Once the software development industry matures a little, more people will start realising this.
I don't find myself disagreeing with you that much here, but 13% is not quite enough to get me to push my team to make such a big change.
On the other hand, I've also had to push for getting any compression turned on at all. If all of my infrastructure already supported zstandard, getting a 3.5:1 compression ratio would be more compelling than 3.2:1, and I can imagine situations where it's easier to get buy-in.
But first every old cell phone and crappy web browser (read: 6 year old Microsoft browser) has to support it. Which is why you have people creating backward-compatible compressors that spit out files that zlib can decompress. Because otherwise you have to support and test 3 transport formats instead of 2.
"The benefits we’ve found typically range from a 30 percent better ratio to 3x better speed."
In what cases? What methodology was used to evaluate this? I certainly expected a scientific (honest) treatment of how the performance was evaluated and the corresponding trade-offs to follow on at some point.
In short, we work with individual teams and projects to evaluate their priorities, and select or build the compression scheme that best addresses their needs. Some of those cases are described in the post, but we wanted to summarize the general kinds of results we see.
I did spend a fair amount of time working on a graphic for the post to try to capture the distribution of improvements we've seen across different use cases at Facebook, but it ended up being very hard to interpret / glean anything meaningful from.
It's difficult to present rigorous conclusions: the reality of data compression is that every use case is different, with different priorities and data characteristics that cause different compressors to behave differently.
Some data (random noise) is totally incompressible, and zstd will do just as poorly as any other algorithm. On the other hand, with highly structured / repetitive data like JSON, zstd can do enormously better than zlib. It all depends on the specific context.
Another example of how benchmarking is hard: we recently spent a fair amount of time improving zstd for highly-contended memory scenarios [1]. This work basically won't show up in a standard single-threaded single-workload benchmark. But we saw meaningful improvements in the real world at Facebook.
Ultimately, if you are evaluating using zstd (or any other compressor), the best predictor for performance will be a benchmark you run yourself with your own data. And hopefully you like what you see!
I will say that we've been 'doing just fine' with zlib for longer than I'm comfortable with. I hung out on comp.compression when I was still in college and shared the dream of writing the next great compression library. Maybe the only thing I ever accomplished though was altering a Java minifier to improve compressibility of the class files.
On a very deep level I'm pretty disappointed that zlib has been 'good enough' for almost 30 years. When it was Google proposing a change, I wasn't enthused about handing more control over HTTP to Google. They have too much already. The ways I'm concerned about Facebook have nothing to do with standards bodies or protocols. So maybe this is good enough.
There are more options for files-in-motion and files-at-rest, and that could put it over the top. But you need to be filing high quality PRs on both HAProxy and Nginx if you want anybody to care.
zstd totally dominates zlib. There is nothing that zlib does that zstd does not do better.
It’s just that simple.
For a given compression speed, zstd will compress the file more that zlib would. For a given compression amount, zstd will compress the data much faster than zlib would.
Except for compatibility. If you are compressing files for yourself it's no big deal, but when you're compressing them for someone else you have to consider if your bleeding edge compression software will be an issue, especially if they are on some corporate locked down machine and can't install new software.
To the GGP’s point, not a single machine at $dayjob (financial services) has chocolatey on the whitelist of software allowed.
We couldn’t even use GoToMeeting in our industry for ages because it required an executable download that was mostly not on the whitelist of our 2000+ customer banks & credit unions.
Windows has this thing called Software Restriction Policies, think of it as an Apple App Store curated by windows sysadmins. You can whitelist by code signer, SHA256, or even file path and name (which is insecure).
https://docs.microsoft.com/en-us/windows-server/identity/sof...
I hope they pay greater attention to the low and high end of their compression ratio spectrum. On the low end, it'd be great if it could exceed lz4 in terms of speed and memory savings. On the high end it'd be great to exceed XZ/LZMA.
Right now it's impressive "in the middle", but I find myself in a lot of situations where I care about the extremes. I.e. for something that will be transferred a lot, or cold-stored, I want maximum compression, CPU/RAM usage be damned, within reason. So I tend to use LZMA there if files aren't too large. For realtime/network RPC scenarios I want minimum RAM/CPU usage and Pareto-optimality on multi-GbE networks. This is where I use LZ4 (and used to use Snappy/Zippy).
At their scale, though, FB is surely saving many millions of dollars thanks to deploying this, both in human/machine time savings and storage savings.
It does depend on your dataset. In my experiments, Zstandard did outperform XZ/LZMA in terms of compression ratio, but not by much. However, the massive improvement in decompression speed is what won me over.
We recently added negative compression levels that extends the fast end of the spectrum significantly. We are also working on incrementally improving the strong end of the compression ratio spectrum. We don't expect plain zstd to compress stronger than xz, but we hope to close the gap some.
With zstd you end up on average 5-6 % worse than LZMA in density on a variety of large test corpora, but decodes 8x faster. Brotli with --large_window 30 can get within 0.6 % of LZMA, and decodes 5x faster than LZMA.
Charles Bloom is a data compression expert working on the compression suite at RAD game tools (proprietary library for game developers that delivers better compression ratios and better decompression speed over the best available in open source, i.e. lz4, zstd and lzma): http://www.radgametools.com/oodlecompressors.htm
But no third party not involved with RAD game tools has _ever_ verified these bold claims. Heck, everyone can construct benchmarks to make almost any compression algorithm outperform any other. But if it happens with real world data too is another story.
Thing is, compression performance characteristics depend on data. You will have to run benchmarks on your own data to get real representative and conclusive results.
It's Enterprise slang for "we are big, so we assume everything we do works better at big scale, but please don't check if it's true, just trust us". (in this thread it might be true, usually it isn't though)
Bullshit, Zstd was open-source from the very beginning, they just hired Yann and moved the project under facebook org. How do I know? I have written the JVM bindings [1] since v0.1 that are now used by Spark, Kafka, etc.
EDIT: Actually, my initial bindings were against v0.0.2 [2]
Kudos to FB for hiring him and helping Zstd getting production ready. This is just a PR false claim.
[1] https://github.com/luben/zstd-jni
[2] https://github.com/luben/zstd-jni/commit/3dfe760cbb8cc46da32...