We took a very detailed look at Brotli performance (both speed and compression) and concluded that for dynamic content it was only useful if the file size was greater than 64k on slow connections and there's a very detailed description here:
Since we don't charge by byte the focus is on end-user performance and not 'saving money' and we concluded that it wasn't currently worth implementing Brotli widely but will continue to experiment with it.
Seems important to re-iterate that for static content (javascript and css bundles, static HTML sites), Brotli is a definitive win. I appreciated that the author was clear that the disadvantages applied to dynamic content, by always qualifying "content".
Exactly :)
If Cloudflare decides it’s worth several man-days to offer this feature to their clients who deliver larger files than 64k, feel free to reach out to us, we’ll be happy to share our experience :) devs@cdn77.com
Keep a rolling average of each user's request filesize and if it exceeds 64KB (image, video, pdf, downloads) then notify them of a possible performance boost to switch to Brotli.
I agree, there is a small subset of such files where small pieces (or even the whole) can be compressed. If we just say "images" or "videos" there are also "uncompressed" variants of such, but such images or videos are always those which aren't even prepared to be on the web. E.g. "uncompressed AVI" or "Windows BMP." Those prepared for the web (e.g. MP4 or PNG) certainly aren't losslessly compressible.
Those that want to deliver the videos or pictures should use the proper format and not depend on Brotli or zlib.
CloudFlare is very close to end users (single digit milliseconds) and gzip is fast and gives good compression. What we can't afford to do is make a file smaller at the cost of increased latency because the compression time was longer. There's a trade off between compression time and delivery time.
So, Brotli makes sense for things we cache (because we can compress out of band) and for large files delivered over slow connections but is not a panacea.
You guys do some of the most interesting blog postings.
I work a block from your office in SF; may I make a request: can you guys host a public tech-talk so I can come to your office and hear it straight from your engineers and maybe have the opportunity to ask questions.
Your Writeup about the HFT NICs from solar(something) circa ~2012 has had me hooked on cloudflares opinion on pretty much everything.
The only other Corp blog post I love as much as yours are the ones from Backblaze re their pods.
I see its dictionary is optimized for web content.
Seems a bit like cheating ;)
https://en.wikipedia.org/wiki/Brotli
Unlike most general purpose compression algorithms, Brotli uses a pre-defined 120 kilobyte dictionary. The dictionary contains over 13000 common words, phrases and other substrings derived from a large corpus of text and HTML documents.[6][7] A pre-defined algorithm can give a compression density boost for short data files.
It leads people to believe this will be a compression improvement for all their content, but after they do the work to integrate it they may discover that it's only a slight improvement because their content isn't identical to the content its static dictionary was designed for.
It's at least a very good codec, though, so it's still a win for other data. Just smaller than you might expect.
Also note that the dictionary is strongly biased towards English. Sure, there is some Russian, Chinese, Arabic (and probably some other scripts in there which I don't recognize), but there seems to be more English words in there than all those others combined. If you're compressing small documents in any other language than English, it might not be worth it to use Brotli.
>
Unlike other algorithms compared here, brotli includes a static dictionary. It contains 13’504 words or syllables of English, Spanish, Chinese, Hindi, Russian and Arabic, as well as common phrases used in machine readable languages, particularly HTML and JavaScript. The total size of the static dictionary is 122’784 bytes. The static dictionary is extended by a mechanism of transforms that slightly change the words in the dictionary. A total of 1’633’984 sequences, although not all of them unique, can be constructed by using the 121 transforms. To reduce the amount of bias the static dictionary gives to the results, we used a multilingual web corpus of 93 different languages where only 122 of the 1285 documents (9.5 %) are in languages supported by our static dictionary.
It isn't a static dictionary; it just has a (large) initial value, unlike gzip.
Brotli is a great compressor, especially at levels 2-5. Unfortunately, the Google paper on Brotli runs tests at levels 1 and 11. I don't get that at all when their stated goal was to replace gzip.
I'm not sure there's anything wrong with it. But I'd be interested to know how it compares across different languages, like Norwegian and Japanese. I'm not saying it's the case, but there's a bit of a difference between "better than gzip for web content" and "better than gzip for web content in English". It'd be nice to see a test across something like the various international Project Gutenberg collections.
Huh, yeah that does feel like cheating. Even having some sort of sheared dictionary feels better than a hard-coded one. I wonder how well it performs on things that aren't text based?
Well unlike fax machines though are protocol shifts over time. New tags and conventions are invented. A known shared dictionary seems a better method than a hard coded static dictionary.
It's fairly competitive with LZHAM, at least, even if it's way slower to compress.
You will get much better results out of Brotli if you restructure your data to be more compressible, and that will also improve your lzma and gzip (especially gzip) compression ratios, to a tremendous degree. Have you done any of this? If not, ping me, and I can explain some techniques to apply.
>You will get much better results out of Brotli if you restructure your data to be more compressible, and that will also improve your lzma and gzip (especially gzip) compression ratios, to a tremendous degree.
This sounds interesting, I'd like to read some examples/links/explanations.
I imagine a lot of it is segregating your content. All strings in one file, etc.
It'd be an interesting test to take our eight-language set of localization strings and compress them in UI order and language order and see if there's much of a difference. (UI order is all languages for one dialog element, then all eight for the next, etc. Language order is all the English first, then ...)
I'd definitely like to hear Kevingadd's tips though.
Like Facebook's Dragon, Brotli is an algorithm that is optimized for typical usage patterns.
Similarly, an entire IOS device could be fabricated as a single ASIC, and (for example) uikit could be fabricated as part of that ASIC.
There is always a tradeoff between generic optimization and usage-specific optimization which comes at the expense of flexibility.
Google can do a statistical analysis of all the data it serves compressed with gzip, and determine exactly the characteristics of a compression algorithm that would save the most money.
These are small, evolutionary optimizations that save tons of money by incrementally increasing efficiency in a large system.
"In late September, Google released a compression algorithm called Brotli and gave files it makes the extension “.bro”.
But last week the extension was changed to “.br”.
The reason for the change is threads like this one, in which posters suggest that “'bro' has a gender problem” and “comes of[f] misogynistic and unprofessional due to the world it lives in.”
The -li Ending, basically just the diminutive, is not necessary to convey the meaning. -br is "cold", -brot is "bread", -brotli is also "bread". I kind of like the whole archive.bread thing, but with the German sound to it.
Filename autocompletion is a thing though, and especially for .brotli files, which would likely be generated by something like `brotli uncompressed.thing`, the length matters very little.
So, instead of using a word which is mostly used as a friendly term of familiarity/endearment, they decided to go with a word which has connotations of racism.
Because no one has used Symbian in 10 years and the vast majority of us have never seen or heard of a .sis file? Not really parity if .sis is extraordinarily obscure and .bro becomes pervasive.
Don't speak for the entire world ("no one"), I personally know two guys that still use Symbian-powered smartphones and don't plan to change them to anything.
Well, someone picked .bro as the initial extension by simply shortening .brotli, and someone else pointed out that .brt might be less potentially controversial. You can find the exchange on (I think) the chrome bugtracker somewhere (or maybe it was the mozilla one?). No-one actually took offense at .bro though.
Offensive is far too strong. Controversial would be much better, and it clearly (in my opinion) was, and would have continued to be. Why choose a name which is going to cause such issues, when there wasn't (yet) any reason not to change it?
Unfortunately today's social/culture/politics environment is very sensitive and it's pretty much guaranteed that there will be a problem with this being offensive, insensitive or otherwise unprofessional.
It's the world we live in today and much better to just avoid altogether now rather than try to defend/fix it later.
The common definition of the term "bro" is (quoting wikipedia) 'a type of "fratty masculinity", predominantly "if not exclusively" white'.
Now, you can not like that definition of bro, but that is for many people to common definition. The first thing that comes to mind when I hear 'bro' is the phrase 'bros before hoes', which is hard to interpret as not offensive to women.
Does this really connect to a compression format? Of course not, but if that is the first thing that comes to people's minds, why not switch to something with less baggage?
If you can't see how any of this could be offensive, then I'm afraid we just have different points of view.
Everything can be offensive to someone if they try hard enough. Just pick something that doesn't intend to be rude and ignore the few haters who have to make everything about them.
To be fair, you don't have to try very hard to find 'bro' offensive. The definition of bro I'm familar with (quoting wikipedia here) is 'a type of "fratty masculinity", predominantly "if not exclusively" white'. See also the phrase 'bros before hoes'.
This isn't the most serious thing in the world, but there are definitely people who find it excluding, and that seems reasonable to me.
The reason you wouldn't feel it was only for female use is that (I imagine, I'm sorry if I misinterpret your life) you have not been made to feel that a field you love (computing) is mainly "not for you", but for the other gender (or another group).
While it's not nice to accept, most women I know well in computing have had bad experiences with "bro"-type people, saying they shouldn't / can't use a computer because they are a woman. On an at least monthly basis. For years. It grinds slowly over time.
If you think CS is professional, I've got some bad news for you -- there are quite a lot of toxic badly behaved people around unfortunately.
It could be. I could mean it that way, or I could mean it like "Sis", or I could just like the sound. Without intent to be rude, it's not.
But I disagree with how hard you have to try. The people on here are trying fairly hard to make sure everyone knows how offensive they might find it. It didn't offend you, but you're offended that it might offend someone, or worse - that it might not offend someone. Nobody said "It's the compression method for white men", that whole racist angle is yours.
I find your shaming word-police game to be exclusionary. Please stop it.
I do not find it offensive. But, I do know people who found it offensive, and wanted it changed. If that is the case, I can see no particular reason not to change the name.
Who exactly is being excluded? No-one (it seems to me) watch attached to .bro, it only existed for a couple of days.
Who wanted it is beside the point. It was crushed under political correctness (ie the assumption of unspeakable harm) rather than any actual harm in only a few days.
If anyone did like it I doubt they'd have felt free to speak up.
> I do know people who found it offensive
I find Bros offensive - in my house. But the word? No. "Nazi" is just a word, Nazis are offensive.
> I can see no particular reason not to change the name.
Ditto there's no reason to. Someone went out of their way to take offense which was bounced around an echo chamber and now everyone is offended by something they didn't know existed before today.
http://caniuse.com/#search=brotli tells us FF already supports this, and IE has it marked as being under implementation consideration (which one would expect to turn into active development once they finish their current WOFF2 support work; WOFF2 has a hard dependency on a working Brotli implementation)
I did look around for known brotli+Firefox+nginx issues, but the only one that comes up is https://bugzilla.mozilla.org/show_bug.cgi?id=1215724 which was fixed in shipping Firefox months ago, so I'm assuming that's not the one you're talking about.
Sure, there is no official bug for this. The only way to see it is to check the dates when the new API was applied - its far too late after releasing versions already supporting brotli, but this commit was not backported anywhere so in a real life, its broken in FF. Just get an nginx, nginx brotli module from google and use FF 45. Good luck :)
Ah, looks like the part of bug 1242904 that was backported is just the security fix, not the entire library version update and the version update is in 46. So what you could be seeing is something like https://bugzilla.mozilla.org/show_bug.cgi?id=1254411
Of course the version update wasn't supposed to change the on-the-wire behavior... or so the library authors claimed. :(
Brotli actually has levels of compression that are more dense than gzip and compress/decompress faster. There is also a spec, xz has a reference implementation but no spec.
I think decompression speed is more important. I'm willing to burn compression time precompressing resources. For more resources, especially JS and images, you can achieve precompression. Or, if you compress on the fly, you can cache the result.
So since this is a "dictionary" based algorithm, what about specializing the dictionary per content (content/type=javascript would have a different dictionary than html for instance).
Also to embed Brotli for my custom application format, what should I expect by specializing the dictionary for my use case?
Title should be changed to "...25% improvement over Gzip"
re: "Brotli should bring 25% reduction in data size compared to Gzip for the most common assets like Javascript and CSS files. For HTML, Brotli promises up to 40% difference (with median around 25%)."
https://blog.cloudflare.com/results-experimenting-brotli/
If you want to experiment with it, it's enabled on CloudFlare's test server https://http2.cloudflare.com/.
Since we don't charge by byte the focus is on end-user performance and not 'saving money' and we concluded that it wasn't currently worth implementing Brotli widely but will continue to experiment with it.