Hacker News new | past | comments | ask | show | jobs | submit login
CDN77 Now Supports Brotli (cdn77.com)
227 points by clarinois on March 22, 2016 | hide | past | favorite | 108 comments



We took a very detailed look at Brotli performance (both speed and compression) and concluded that for dynamic content it was only useful if the file size was greater than 64k on slow connections and there's a very detailed description here:

https://blog.cloudflare.com/results-experimenting-brotli/

If you want to experiment with it, it's enabled on CloudFlare's test server https://http2.cloudflare.com/.

Since we don't charge by byte the focus is on end-user performance and not 'saving money' and we concluded that it wasn't currently worth implementing Brotli widely but will continue to experiment with it.


Seems important to re-iterate that for static content (javascript and css bundles, static HTML sites), Brotli is a definitive win. I appreciated that the author was clear that the disadvantages applied to dynamic content, by always qualifying "content".


There's a lot of stuff greater then 64k nowadays ;)


Exactly :) If Cloudflare decides it’s worth several man-days to offer this feature to their clients who deliver larger files than 64k, feel free to reach out to us, we’ll be happy to share our experience :) devs@cdn77.com


Thanks. We actually wrote and open sourced our nginx modifications for choosing between Brotli and gzip.

https://github.com/cloudflare/ngx_brotli_module


From the Cloduflare's tests:

"Most files are smaller than 64KB, and if we look only at those files then Brotli 4 is actually 1.48X slower than zlib level 8!"

And the faster zlib levels (1-7) are even faster! See the table. I especially like zlib 1.


Keep a rolling average of each user's request filesize and if it exceeds 64KB (image, video, pdf, downloads) then notify them of a possible performance boost to switch to Brotli.


> if it exceeds 64KB (image, video, pdf, downloads) then notify them

All these can't be compressed more than they already are, forcing any lossless (like zlib and brotli) compressor on them is just the loss of CPU time.

Try and compare.

Compression 101.


There are actually some cases where you can gain by losslessly compressing these. For example, EXIF data in a JPEG is plain text, so gzip can help:

    $ curl -sS http://www.exiv2.org/include/img_1771.jpg | wc -c
    32764
    $ curl -sS http://www.exiv2.org/include/img_1771.jpg | gzip -9 | wc -c
    31323
In this case, though, it often makes more sense to just remove the EXIF data.

PDF files can be even more compressible, at least ones that are text and layout as opposed to embedded images:

    $ curl -sS  http://www.polyu.edu.hk/iaee/files/pdf-sample.pdf | wc -c
    7945
    $ curl -sS  http://www.polyu.edu.hk/iaee/files/pdf-sample.pdf | gzip -9 | wc -c
    4336


I agree, there is a small subset of such files where small pieces (or even the whole) can be compressed. If we just say "images" or "videos" there are also "uncompressed" variants of such, but such images or videos are always those which aren't even prepared to be on the web. E.g. "uncompressed AVI" or "Windows BMP." Those prepared for the web (e.g. MP4 or PNG) certainly aren't losslessly compressible.

Those that want to deliver the videos or pictures should use the proper format and not depend on Brotli or zlib.


Well they might compress more, but if so they should publish ;)


Gzip/brotli compression is not done for media, typically only for plain text files.


Especially those pesky XML and JSON payloads!


Couldn't you implement it for all files greater than 64K?


Yes, but that would not make sense.

CloudFlare is very close to end users (single digit milliseconds) and gzip is fast and gives good compression. What we can't afford to do is make a file smaller at the cost of increased latency because the compression time was longer. There's a trade off between compression time and delivery time.

So, Brotli makes sense for things we cache (because we can compress out of band) and for large files delivered over slow connections but is not a panacea.


You can even keep cached files in ram with brotli to save ram. Nvm you'd need 3 versions then.


You guys do some of the most interesting blog postings.

I work a block from your office in SF; may I make a request: can you guys host a public tech-talk so I can come to your office and hear it straight from your engineers and maybe have the opportunity to ask questions.

Your Writeup about the HFT NICs from solar(something) circa ~2012 has had me hooked on cloudflares opinion on pretty much everything.

The only other Corp blog post I love as much as yours are the ones from Backblaze re their pods.


That's definitely something we're planning to do more often now that we have a great space for it (both in San Francisco and London).


I love their blog but it's dead this month, I visit twice a day


Lots of great stuff going on internally that has left little time for writing. Will get back to sharing more soon. Thanks for the gentle prod.


Nice to hear



Perfect


Why can't you only do it on cache hits?


I see its dictionary is optimized for web content. Seems a bit like cheating ;) https://en.wikipedia.org/wiki/Brotli Unlike most general purpose compression algorithms, Brotli uses a pre-defined 120 kilobyte dictionary. The dictionary contains over 13000 common words, phrases and other substrings derived from a large corpus of text and HTML documents.[6][7] A pre-defined algorithm can give a compression density boost for short data files.


How is it cheating if it was developed specifically for that?


"Cheating" might be a little strong, but "not actually comparable to general purpose tools like Gzip" seems justifiable.


If we are talking about web content then how exactly is it not comparable?


What's wrong with cheating?


It leads people to believe this will be a compression improvement for all their content, but after they do the work to integrate it they may discover that it's only a slight improvement because their content isn't identical to the content its static dictionary was designed for.

It's at least a very good codec, though, so it's still a win for other data. Just smaller than you might expect.


Also note that the dictionary is strongly biased towards English. Sure, there is some Russian, Chinese, Arabic (and probably some other scripts in there which I don't recognize), but there seems to be more English words in there than all those others combined. If you're compressing small documents in any other language than English, it might not be worth it to use Brotli.

Edit: They wrote this about it in http://www.gstatic.com/b/brotlidocs/brotli-2015-09-22.pdf :

> Unlike other algorithms compared here, brotli includes a static dictionary. It contains 13’504 words or syllables of English, Spanish, Chinese, Hindi, Russian and Arabic, as well as common phrases used in machine readable languages, particularly HTML and JavaScript. The total size of the static dictionary is 122’784 bytes. The static dictionary is extended by a mechanism of transforms that slightly change the words in the dictionary. A total of 1’633’984 sequences, although not all of them unique, can be constructed by using the 121 transforms. To reduce the amount of bias the static dictionary gives to the results, we used a multilingual web corpus of 93 different languages where only 122 of the 1285 documents (9.5 %) are in languages supported by our static dictionary.


Here's the list of words, by the way. I couldn't find it anywhere in non-hexadecimal form:

https://gist.github.com/xnyhps/677f7c1b444f346bef99

(I cleaned it up a bit to remove newlines and tabs, and a couple that are entirely of unprintable characters.)


It isn't a static dictionary; it just has a (large) initial value, unlike gzip.

Brotli is a great compressor, especially at levels 2-5. Unfortunately, the Google paper on Brotli runs tests at levels 1 and 11. I don't get that at all when their stated goal was to replace gzip.


I'm not sure there's anything wrong with it. But I'd be interested to know how it compares across different languages, like Norwegian and Japanese. I'm not saying it's the case, but there's a bit of a difference between "better than gzip for web content" and "better than gzip for web content in English". It'd be nice to see a test across something like the various international Project Gutenberg collections.


What if a new keyword is added to javascript / html.


Then I imagine that keyword would be added to the static dictionary.


Then you lose 6 bytes off of optimal.


Huh, yeah that does feel like cheating. Even having some sort of sheared dictionary feels better than a hard-coded one. I wonder how well it performs on things that aren't text based?


Since when has being designed and optimized for a specific use case ever been "cheating" in software development?


Static dictionaries are used frequently in data communications to "break the bounds" of Shannon's theorem.

See static Huffman coding in fax machines.


Well unlike fax machines though are protocol shifts over time. New tags and conventions are invented. A known shared dictionary seems a better method than a hard coded static dictionary.


I find that Brotli is over sold for the case of compressing generic binary files.

The claims that it is comparable to xz/lzma for generic binary data are not accurate.

In my real-world tests of compressing 3D data it far underperformed xz/lzma although it was still better than gzip:

https://github.com/google/brotli/issues/165


It's fairly competitive with LZHAM, at least, even if it's way slower to compress.

You will get much better results out of Brotli if you restructure your data to be more compressible, and that will also improve your lzma and gzip (especially gzip) compression ratios, to a tremendous degree. Have you done any of this? If not, ping me, and I can explain some techniques to apply.


>You will get much better results out of Brotli if you restructure your data to be more compressible, and that will also improve your lzma and gzip (especially gzip) compression ratios, to a tremendous degree.

This sounds interesting, I'd like to read some examples/links/explanations.


I imagine a lot of it is segregating your content. All strings in one file, etc.

It'd be an interesting test to take our eight-language set of localization strings and compress them in UI order and language order and see if there's much of a difference. (UI order is all languages for one dialog element, then all eight for the next, etc. Language order is all the English first, then ...)

I'd definitely like to hear Kevingadd's tips though.


Like Facebook's Dragon, Brotli is an algorithm that is optimized for typical usage patterns.

Similarly, an entire IOS device could be fabricated as a single ASIC, and (for example) uikit could be fabricated as part of that ASIC.

There is always a tradeoff between generic optimization and usage-specific optimization which comes at the expense of flexibility.

Google can do a statistical analysis of all the data it serves compressed with gzip, and determine exactly the characteristics of a compression algorithm that would save the most money.

These are small, evolutionary optimizations that save tons of money by incrementally increasing efficiency in a large system.


Is it true that Brotli wanted the .bro file extension and moved away from it, because it was deemed offensive?


Yes, that's confirmed.

"In late September, Google released a compression algorithm called Brotli and gave files it makes the extension “.bro”.

But last week the extension was changed to “.br”.

The reason for the change is threads like this one, in which posters suggest that “'bro' has a gender problem” and “comes of[f] misogynistic and unprofessional due to the world it lives in.”

http://www.theregister.co.uk/2015/10/11/googles_bro_file_for...


I would've preferred .brot


or just ".brotli"

modern filesystems can handle a few extra characters


The -li Ending, basically just the diminutive, is not necessary to convey the meaning. -br is "cold", -brot is "bread", -brotli is also "bread". I kind of like the whole archive.bread thing, but with the German sound to it.


but we don't want to type them. when was the last time you typed chdir instead of cd?


when was the last time you were in a shell that didn't have tab completion?

.br<tab> is the same number of characters as .bro


Filename autocompletion is a thing though, and especially for .brotli files, which would likely be generated by something like `brotli uncompressed.thing`, the length matters very little.


http://www.urbandictionary.com/define.php?term=br

So, instead of using a word which is mostly used as a friendly term of familiarity/endearment, they decided to go with a word which has connotations of racism.

I... can't even.


Symbian app installation files have ".sis" extension. Why can't we have ".bro" for parity?


Because no one has used Symbian in 10 years and the vast majority of us have never seen or heard of a .sis file? Not really parity if .sis is extraordinarily obscure and .bro becomes pervasive.


Don't speak for the entire world ("no one"), I personally know two guys that still use Symbian-powered smartphones and don't plan to change them to anything.


I know more people than that who still use a Commodore 64 on the regular. But, it doesn't make it a viable platform.


Newspeak, doublethink and thoughtcrimes


Because context.


Well, someone picked .bro as the initial extension by simply shortening .brotli, and someone else pointed out that .brt might be less potentially controversial. You can find the exchange on (I think) the chrome bugtracker somewhere (or maybe it was the mozilla one?). No-one actually took offense at .bro though.


Offensive is far too strong. Controversial would be much better, and it clearly (in my opinion) was, and would have continued to be. Why choose a name which is going to cause such issues, when there wasn't (yet) any reason not to change it?


Controversial is a bit much, I really don't see how this could be offensive at all.


Unfortunately today's social/culture/politics environment is very sensitive and it's pretty much guaranteed that there will be a problem with this being offensive, insensitive or otherwise unprofessional.

It's the world we live in today and much better to just avoid altogether now rather than try to defend/fix it later.


It's much less "is it offensive", and more "can people find a reason for it to be offensive" these days.


The common definition of the term "bro" is (quoting wikipedia) 'a type of "fratty masculinity", predominantly "if not exclusively" white'.

Now, you can not like that definition of bro, but that is for many people to common definition. The first thing that comes to mind when I hear 'bro' is the phrase 'bros before hoes', which is hard to interpret as not offensive to women.

Does this really connect to a compression format? Of course not, but if that is the first thing that comes to people's minds, why not switch to something with less baggage?

If you can't see how any of this could be offensive, then I'm afraid we just have different points of view.


How about just unprofessional then?


Nope.

Everything can be offensive to someone if they try hard enough. Just pick something that doesn't intend to be rude and ignore the few haters who have to make everything about them.


To be fair, you don't have to try very hard to find 'bro' offensive. The definition of bro I'm familar with (quoting wikipedia here) is 'a type of "fratty masculinity", predominantly "if not exclusively" white'. See also the phrase 'bros before hoes'.

This isn't the most serious thing in the world, but there are definitely people who find it excluding, and that seems reasonable to me.


Excluding? How? If it were a .chick or something similar I would not feel like it was meant for only female use.

This is CS, a lack of groupthink-y "professionalism" is why I like this field.


The reason you wouldn't feel it was only for female use is that (I imagine, I'm sorry if I misinterpret your life) you have not been made to feel that a field you love (computing) is mainly "not for you", but for the other gender (or another group).

While it's not nice to accept, most women I know well in computing have had bad experiences with "bro"-type people, saying they shouldn't / can't use a computer because they are a woman. On an at least monthly basis. For years. It grinds slowly over time.

If you think CS is professional, I've got some bad news for you -- there are quite a lot of toxic badly behaved people around unfortunately.


I agree, women should not be discouraged from the CS field. It is in fact bizarre to me that someone would tell women they don't belong in CS.

But my point is that this is not particularly offensive, it's purely a pun. I think being oversensitive can be as damaging as being unsensitive.


.chick sounds offensive, but for different reasons.


It could be. I could mean it that way, or I could mean it like "Sis", or I could just like the sound. Without intent to be rude, it's not.

But I disagree with how hard you have to try. The people on here are trying fairly hard to make sure everyone knows how offensive they might find it. It didn't offend you, but you're offended that it might offend someone, or worse - that it might not offend someone. Nobody said "It's the compression method for white men", that whole racist angle is yours.

I find your shaming word-police game to be exclusionary. Please stop it.


I do not find it offensive. But, I do know people who found it offensive, and wanted it changed. If that is the case, I can see no particular reason not to change the name.

Who exactly is being excluded? No-one (it seems to me) watch attached to .bro, it only existed for a couple of days.


Who wanted it is beside the point. It was crushed under political correctness (ie the assumption of unspeakable harm) rather than any actual harm in only a few days.

If anyone did like it I doubt they'd have felt free to speak up.

> I do know people who found it offensive

I find Bros offensive - in my house. But the word? No. "Nazi" is just a word, Nazis are offensive.

> I can see no particular reason not to change the name.

Ditto there's no reason to. Someone went out of their way to take offense which was bounced around an echo chamber and now everyone is offended by something they didn't know existed before today.

That's not something we should reward.


Interesting. Is it also middle-out?


While that is true, I have another question: is it really necessary to pull that topic off here again?


it was not necessary, only to satiate my curiosity.


"Deemed" by whom?


Nice. Hope to see it in FF soon. And in Internet Explorer 87.


http://caniuse.com/#search=brotli tells us FF already supports this, and IE has it marked as being under implementation consideration (which one would expect to turn into active development once they finish their current WOFF2 support work; WOFF2 has a hard dependency on a working Brotli implementation)


Try nginx & FF45 -> doesnt work as FF uses deprecated (and now obsolete & removed) API


Reference for this, please? Because I see no bugs filed on FF regarding brotli apart from https://bugzilla.mozilla.org/show_bug.cgi?id=1222541 and https://bugzilla.mozilla.org/show_bug.cgi?id=1207234 neither of which seems to match your problem description. I'm happy to get a bug filed for you so whatever problem this is can be fixed, but I need a bit more to go on here...

I did look around for known brotli+Firefox+nginx issues, but the only one that comes up is https://bugzilla.mozilla.org/show_bug.cgi?id=1215724 which was fixed in shipping Firefox months ago, so I'm assuming that's not the one you're talking about.


Sure, there is no official bug for this. The only way to see it is to check the dates when the new API was applied - its far too late after releasing versions already supporting brotli, but this commit was not backported anywhere so in a real life, its broken in FF. Just get an nginx, nginx brotli module from google and use FF 45. Good luck :)

https://hg.mozilla.org/releases/mozilla-aurora/log/10e1774de...


> but this commit was not backported anywhere

Which commit are you talking about, exactly? Bug 1242904 was backported to Firefox 45, and everything else I see at http://hg.mozilla.org/mozilla-central/filelog/tip/modules/br... as of today (which is the same as <http://hg.mozilla.org/mozilla-central/filelog/ea6298e1b4f7/m...) was checked in way before 45....


Ah, looks like the part of bug 1242904 that was backported is just the security fix, not the entire library version update and the version update is in 46. So what you could be seeing is something like https://bugzilla.mozilla.org/show_bug.cgi?id=1254411

Of course the version update wasn't supposed to change the on-the-wire behavior... or so the library authors claimed. :(



In 2020 maybe :)


I would hope that content type negotiation should mean that we could capitalize on it where it's supported.


If anyone wants to check out a Rust implementation:

https://github.com/ende76/brotli-rs

It's currently in use in Servo


Are there any memory & CPU consumption graphs?

I also don't understand why not xz, though I know that requires significantly more resource than gzip.


Brotli actually has levels of compression that are more dense than gzip and compress/decompress faster. There is also a spec, xz has a reference implementation but no spec.


> more dense than gzip and compress/decompress faster

If you're saying brotli compresses/decompresses faster that gzip, it doesn't according to Cloudflare tests [1] (see table near bottom).

Even the fastest compression level of Brotli is slower than the highest/slowest compression level of gzip in most cases.

[1]: https://blog.cloudflare.com/results-experimenting-brotli/


> If you're saying brotli compresses/decompresses faster that gzip, it doesn't according to Cloudflare tests.

It doesn't always compress faster, but as far as I can tell they didn't measure or say anything about decompression.


I just did this:

time bro --quality 6 --input linux-4.5.tar --output linux-4.5.tar.bro

real 0m18.436s user 0m18.228s sys 0m0.184s

time gzip -9 linux-4.5.tar real 0m30.555s user 0m30.424s sys 0m0.172s

ls -lh (I removed the metadata): 106M linux-4.5.tar.bro 129M linux-4.5.tar.gz

Only 82% of the size at 60% of the time. Now this is pure text so that's not a good example of everything.


If it takes a lot of CPU, it might not be worth the CPU time or cost. https://i.imgur.com/bAc1Saq.png Image from https://dl.acm.org/citation.cfm?id=1084786&dl=ACM&coll=DL&CF... And gzip speed has improved since 2005, so over a 10 Mbps connection, you'd have to get much better compression to be worth the switch.


>XZ is a file format, LZMA is the compression algorithm.

https://bugzilla.mozilla.org/show_bug.cgi?id=366559#c18


I think decompression speed is more important. I'm willing to burn compression time precompressing resources. For more resources, especially JS and images, you can achieve precompression. Or, if you compress on the fly, you can cache the result.


Heh, Brotli + http2, it looks so Fast: http://www.http2demo.io


So since this is a "dictionary" based algorithm, what about specializing the dictionary per content (content/type=javascript would have a different dictionary than html for instance). Also to embed Brotli for my custom application format, what should I expect by specializing the dictionary for my use case?


Title should be changed to "...25% improvement over Gzip"

re: "Brotli should bring 25% reduction in data size compared to Gzip for the most common assets like Javascript and CSS files. For HTML, Brotli promises up to 40% difference (with median around 25%)."



Is this a Silicon Valley TV show joke or is this real?

Not sarcasm, real question.


What made you wonder about it being a joke?


It's real


Do we have any implementation for Brotli encoder/decoder on JVM or JNI/JNA is the best option available right now for using Brotli on jvm apps?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: