CDN77 Now Supports Brotli

jgrahamc · on March 22, 2016

We took a very detailed look at Brotli performance (both speed and compression) and concluded that for dynamic content it was only useful if the file size was greater than 64k on slow connections and there's a very detailed description here:

https://blog.cloudflare.com/results-experimenting-brotli/

If you want to experiment with it, it's enabled on CloudFlare's test server https://http2.cloudflare.com/.

Since we don't charge by byte the focus is on end-user performance and not 'saving money' and we concluded that it wasn't currently worth implementing Brotli widely but will continue to experiment with it.

mshenfield · on March 22, 2016

Seems important to re-iterate that for static content (javascript and css bundles, static HTML sites), Brotli is a definitive win. I appreciated that the author was clear that the disadvantages applied to dynamic content, by always qualifying "content".

KurvaKde1334 · on March 22, 2016

There's a lot of stuff greater then 64k nowadays ;)

clarinois · on March 22, 2016

Exactly :) If Cloudflare decides it’s worth several man-days to offer this feature to their clients who deliver larger files than 64k, feel free to reach out to us, we’ll be happy to share our experience :) devs@cdn77.com

jgrahamc · on March 22, 2016

Thanks. We actually wrote and open sourced our nginx modifications for choosing between Brotli and gzip.

https://github.com/cloudflare/ngx_brotli_module

acqq · on March 22, 2016

From the Cloduflare's tests:

"Most files are smaller than 64KB, and if we look only at those files then Brotli 4 is actually 1.48X slower than zlib level 8!"

And the faster zlib levels (1-7) are even faster! See the table. I especially like zlib 1.

jqueryin · on March 22, 2016

Keep a rolling average of each user's request filesize and if it exceeds 64KB (image, video, pdf, downloads) then notify them of a possible performance boost to switch to Brotli.

acqq · on March 22, 2016

> if it exceeds 64KB (image, video, pdf, downloads) then notify them

All these can't be compressed more than they already are, forcing any lossless (like zlib and brotli) compressor on them is just the loss of CPU time.

Try and compare.

Compression 101.

jefftk · on March 22, 2016

There are actually some cases where you can gain by losslessly compressing these. For example, EXIF data in a JPEG is plain text, so gzip can help:

    $ curl -sS http://www.exiv2.org/include/img_1771.jpg | wc -c
    32764
    $ curl -sS http://www.exiv2.org/include/img_1771.jpg | gzip -9 | wc -c
    31323

In this case, though, it often makes more sense to just remove the EXIF data.

PDF files can be even more compressible, at least ones that are text and layout as opposed to embedded images:

    $ curl -sS  http://www.polyu.edu.hk/iaee/files/pdf-sample.pdf | wc -c
    7945
    $ curl -sS  http://www.polyu.edu.hk/iaee/files/pdf-sample.pdf | gzip -9 | wc -c
    4336

acqq · on March 22, 2016

I agree, there is a small subset of such files where small pieces (or even the whole) can be compressed. If we just say "images" or "videos" there are also "uncompressed" variants of such, but such images or videos are always those which aren't even prepared to be on the web. E.g. "uncompressed AVI" or "Windows BMP." Those prepared for the web (e.g. MP4 or PNG) certainly aren't losslessly compressible.

Those that want to deliver the videos or pictures should use the proper format and not depend on Brotli or zlib.

TTPrograms · on March 22, 2016

Well they might compress more, but if so they should publish ;)

snug · on March 22, 2016

Gzip/brotli compression is not done for media, typically only for plain text files.

mwpmaybe · on March 22, 2016

Especially those pesky XML and JSON payloads!

aembleton · on March 22, 2016

Couldn't you implement it for all files greater than 64K?

jgrahamc · on March 22, 2016

Yes, but that would not make sense.

CloudFlare is very close to end users (single digit milliseconds) and gzip is fast and gives good compression. What we can't afford to do is make a file smaller at the cost of increased latency because the compression time was longer. There's a trade off between compression time and delivery time.

So, Brotli makes sense for things we cache (because we can compress out of band) and for large files delivered over slow connections but is not a panacea.

homero · on March 22, 2016

You can even keep cached files in ram with brotli to save ram. Nvm you'd need 3 versions then.

samstave · on March 22, 2016

You guys do some of the most interesting blog postings.

I work a block from your office in SF; may I make a request: can you guys host a public tech-talk so I can come to your office and hear it straight from your engineers and maybe have the opportunity to ask questions.

Your Writeup about the HFT NICs from solar(something) circa ~2012 has had me hooked on cloudflares opinion on pretty much everything.

The only other Corp blog post I love as much as yours are the ones from Backblaze re their pods.

eastdakota · on March 22, 2016

That's definitely something we're planning to do more often now that we have a great space for it (both in San Francisco and London).

homero · on March 22, 2016

I love their blog but it's dead this month, I visit twice a day

eastdakota · on March 22, 2016

Lots of great stuff going on internally that has left little time for writing. Will get back to sharing more soon. Thanks for the gentle prod.

homero · on March 22, 2016

Nice to hear

jgrahamc · on March 23, 2016

Here you go: https://blog.cloudflare.com/tls-certificate-optimization-tec...

homero · on March 24, 2016

Perfect

homero · on March 22, 2016

Why can't you only do it on cache hits?

ape4 · on March 22, 2016

I see its dictionary is optimized for web content. Seems a bit like cheating ;) https://en.wikipedia.org/wiki/Brotli Unlike most general purpose compression algorithms, Brotli uses a pre-defined 120 kilobyte dictionary. The dictionary contains over 13000 common words, phrases and other substrings derived from a large corpus of text and HTML documents.[6][7] A pre-defined algorithm can give a compression density boost for short data files.

return0 · on March 22, 2016

How is it cheating if it was developed specifically for that?

Lagged2Death · on March 22, 2016

"Cheating" might be a little strong, but "not actually comparable to general purpose tools like Gzip" seems justifiable.

kami8845 · on March 22, 2016

If we are talking about web content then how exactly is it not comparable?

NelsonMinar · on March 22, 2016

What's wrong with cheating?

kevingadd · on March 22, 2016

It leads people to believe this will be a compression improvement for all their content, but after they do the work to integrate it they may discover that it's only a slight improvement because their content isn't identical to the content its static dictionary was designed for.

It's at least a very good codec, though, so it's still a win for other data. Just smaller than you might expect.

xnyhps · on March 22, 2016

Also note that the dictionary is strongly biased towards English. Sure, there is some Russian, Chinese, Arabic (and probably some other scripts in there which I don't recognize), but there seems to be more English words in there than all those others combined. If you're compressing small documents in any other language than English, it might not be worth it to use Brotli.

Edit: They wrote this about it in http://www.gstatic.com/b/brotlidocs/brotli-2015-09-22.pdf :

> Unlike other algorithms compared here, brotli includes a static dictionary. It contains 13’504 words or syllables of English, Spanish, Chinese, Hindi, Russian and Arabic, as well as common phrases used in machine readable languages, particularly HTML and JavaScript. The total size of the static dictionary is 122’784 bytes. The static dictionary is extended by a mechanism of transforms that slightly change the words in the dictionary. A total of 1’633’984 sequences, although not all of them unique, can be constructed by using the 121 transforms. To reduce the amount of bias the static dictionary gives to the results, we used a multilingual web corpus of 93 different languages where only 122 of the 1285 documents (9.5 %) are in languages supported by our static dictionary.

xnyhps · on March 22, 2016

Here's the list of words, by the way. I couldn't find it anywhere in non-hexadecimal form:

https://gist.github.com/xnyhps/677f7c1b444f346bef99

(I cleaned it up a bit to remove newlines and tabs, and a couple that are entirely of unprintable characters.)

prirun · on March 23, 2016

It isn't a static dictionary; it just has a (large) initial value, unlike gzip.

Brotli is a great compressor, especially at levels 2-5. Unfortunately, the Google paper on Brotli runs tests at levels 1 and 11. I don't get that at all when their stated goal was to replace gzip.

e12e · on March 22, 2016

I'm not sure there's anything wrong with it. But I'd be interested to know how it compares across different languages, like Norwegian and Japanese. I'm not saying it's the case, but there's a bit of a difference between "better than gzip for web content" and "better than gzip for web content in English". It'd be nice to see a test across something like the various international Project Gutenberg collections.

ape4 · on March 22, 2016

What if a new keyword is added to javascript / html.

jzymbaluk · on March 22, 2016

Then I imagine that keyword would be added to the static dictionary.

Dylan16807 · on March 22, 2016

Then you lose 6 bytes off of optimal.

donatj · on March 22, 2016

Huh, yeah that does feel like cheating. Even having some sort of sheared dictionary feels better than a hard-coded one. I wonder how well it performs on things that aren't text based?

efuquen · on March 22, 2016

Since when has being designed and optimized for a specific use case ever been "cheating" in software development?

IncRnd · on March 22, 2016

Static dictionaries are used frequently in data communications to "break the bounds" of Shannon's theorem.

See static Huffman coding in fax machines.

donatj · on March 22, 2016

Well unlike fax machines though are protocol shifts over time. New tags and conventions are invented. A known shared dictionary seems a better method than a hard coded static dictionary.

bhouston · on March 22, 2016

I find that Brotli is over sold for the case of compressing generic binary files.

The claims that it is comparable to xz/lzma for generic binary data are not accurate.

In my real-world tests of compressing 3D data it far underperformed xz/lzma although it was still better than gzip:

https://github.com/google/brotli/issues/165

kevingadd · on March 22, 2016

It's fairly competitive with LZHAM, at least, even if it's way slower to compress.

You will get much better results out of Brotli if you restructure your data to be more compressible, and that will also improve your lzma and gzip (especially gzip) compression ratios, to a tremendous degree. Have you done any of this? If not, ping me, and I can explain some techniques to apply.

detaro · on March 22, 2016

>You will get much better results out of Brotli if you restructure your data to be more compressible, and that will also improve your lzma and gzip (especially gzip) compression ratios, to a tremendous degree.

This sounds interesting, I'd like to read some examples/links/explanations.

EdHominem · on March 22, 2016

I imagine a lot of it is segregating your content. All strings in one file, etc.

It'd be an interesting test to take our eight-language set of localization strings and compress them in UI order and language order and see if there's much of a difference. (UI order is all languages for one dialog element, then all eight for the next, etc. Language order is all the English first, then ...)

I'd definitely like to hear Kevingadd's tips though.

grandalf · on March 22, 2016

Like Facebook's Dragon, Brotli is an algorithm that is optimized for typical usage patterns.

Similarly, an entire IOS device could be fabricated as a single ASIC, and (for example) uikit could be fabricated as part of that ASIC.

There is always a tradeoff between generic optimization and usage-specific optimization which comes at the expense of flexibility.

Google can do a statistical analysis of all the data it serves compressed with gzip, and determine exactly the characteristics of a compression algorithm that would save the most money.

These are small, evolutionary optimizations that save tons of money by incrementally increasing efficiency in a large system.

sremani · on March 22, 2016

Is it true that Brotli wanted the .bro file extension and moved away from it, because it was deemed offensive?

Bud · on March 22, 2016

Yes, that's confirmed.

"In late September, Google released a compression algorithm called Brotli and gave files it makes the extension “.bro”.

But last week the extension was changed to “.br”.

The reason for the change is threads like this one, in which posters suggest that “'bro' has a gender problem” and “comes of[f] misogynistic and unprofessional due to the world it lives in.”

http://www.theregister.co.uk/2015/10/11/googles_bro_file_for...

ant6n · on March 22, 2016

I would've preferred .brot

oftenwrong · on March 22, 2016

or just ".brotli"

modern filesystems can handle a few extra characters

ant6n · on March 22, 2016

The -li Ending, basically just the diminutive, is not necessary to convey the meaning. -br is "cold", -brot is "bread", -brotli is also "bread". I kind of like the whole archive.bread thing, but with the German sound to it.

chocolatebunny · on March 22, 2016

but we don't want to type them. when was the last time you typed chdir instead of cd?

ctrl-j · on March 22, 2016

when was the last time you were in a shell that didn't have tab completion?

.br<tab> is the same number of characters as .bro

akx · on March 22, 2016

Filename autocompletion is a thing though, and especially for .brotli files, which would likely be generated by something like `brotli uncompressed.thing`, the length matters very little.

butbroski · on March 22, 2016

http://www.urbandictionary.com/define.php?term=br

So, instead of using a word which is mostly used as a friendly term of familiarity/endearment, they decided to go with a word which has connotations of racism.

I... can't even.

plugnburn · on March 22, 2016

Symbian app installation files have ".sis" extension. Why can't we have ".bro" for parity?

SwellJoe · on March 22, 2016

Because no one has used Symbian in 10 years and the vast majority of us have never seen or heard of a .sis file? Not really parity if .sis is extraordinarily obscure and .bro becomes pervasive.

plugnburn · on March 22, 2016

Don't speak for the entire world ("no one"), I personally know two guys that still use Symbian-powered smartphones and don't plan to change them to anything.

SwellJoe · on March 24, 2016

I know more people than that who still use a Commodore 64 on the regular. But, it doesn't make it a viable platform.

emsy · on March 22, 2016

Newspeak, doublethink and thoughtcrimes

ant6n · on March 22, 2016

Because context.

pja · on March 22, 2016

Well, someone picked .bro as the initial extension by simply shortening .brotli, and someone else pointed out that .brt might be less potentially controversial. You can find the exchange on (I think) the chrome bugtracker somewhere (or maybe it was the mozilla one?). No-one actually took offense at .bro though.

CJefferson · on March 22, 2016

Offensive is far too strong. Controversial would be much better, and it clearly (in my opinion) was, and would have continued to be. Why choose a name which is going to cause such issues, when there wasn't (yet) any reason not to change it?

d0lph · on March 22, 2016

Controversial is a bit much, I really don't see how this could be offensive at all.

manigandham · on March 22, 2016

Unfortunately today's social/culture/politics environment is very sensitive and it's pretty much guaranteed that there will be a problem with this being offensive, insensitive or otherwise unprofessional.

It's the world we live in today and much better to just avoid altogether now rather than try to defend/fix it later.

brink · on March 22, 2016

It's much less "is it offensive", and more "can people find a reason for it to be offensive" these days.

CJefferson · on March 22, 2016

The common definition of the term "bro" is (quoting wikipedia) 'a type of "fratty masculinity", predominantly "if not exclusively" white'.

Now, you can not like that definition of bro, but that is for many people to common definition. The first thing that comes to mind when I hear 'bro' is the phrase 'bros before hoes', which is hard to interpret as not offensive to women.

Does this really connect to a compression format? Of course not, but if that is the first thing that comes to people's minds, why not switch to something with less baggage?

If you can't see how any of this could be offensive, then I'm afraid we just have different points of view.

efuquen · on March 22, 2016

How about just unprofessional then?

EdHominem · on March 22, 2016

Nope.

Everything can be offensive to someone if they try hard enough. Just pick something that doesn't intend to be rude and ignore the few haters who have to make everything about them.

CJefferson · on March 22, 2016

To be fair, you don't have to try very hard to find 'bro' offensive. The definition of bro I'm familar with (quoting wikipedia here) is 'a type of "fratty masculinity", predominantly "if not exclusively" white'. See also the phrase 'bros before hoes'.

This isn't the most serious thing in the world, but there are definitely people who find it excluding, and that seems reasonable to me.

d0lph · on March 22, 2016

Excluding? How? If it were a .chick or something similar I would not feel like it was meant for only female use.

This is CS, a lack of groupthink-y "professionalism" is why I like this field.

CJefferson · on March 22, 2016

The reason you wouldn't feel it was only for female use is that (I imagine, I'm sorry if I misinterpret your life) you have not been made to feel that a field you love (computing) is mainly "not for you", but for the other gender (or another group).

While it's not nice to accept, most women I know well in computing have had bad experiences with "bro"-type people, saying they shouldn't / can't use a computer because they are a woman. On an at least monthly basis. For years. It grinds slowly over time.

If you think CS is professional, I've got some bad news for you -- there are quite a lot of toxic badly behaved people around unfortunately.

d0lph · on March 23, 2016

I agree, women should not be discouraged from the CS field. It is in fact bizarre to me that someone would tell women they don't belong in CS.

But my point is that this is not particularly offensive, it's purely a pun. I think being oversensitive can be as damaging as being unsensitive.

ant6n · on March 22, 2016

.chick sounds offensive, but for different reasons.

EdHominem · on March 23, 2016

It could be. I could mean it that way, or I could mean it like "Sis", or I could just like the sound. Without intent to be rude, it's not.

But I disagree with how hard you have to try. The people on here are trying fairly hard to make sure everyone knows how offensive they might find it. It didn't offend you, but you're offended that it might offend someone, or worse - that it might not offend someone. Nobody said "It's the compression method for white men", that whole racist angle is yours.

I find your shaming word-police game to be exclusionary. Please stop it.

CJefferson · on March 23, 2016

I do not find it offensive. But, I do know people who found it offensive, and wanted it changed. If that is the case, I can see no particular reason not to change the name.

Who exactly is being excluded? No-one (it seems to me) watch attached to .bro, it only existed for a couple of days.

EdHominem · on March 23, 2016

Who wanted it is beside the point. It was crushed under political correctness (ie the assumption of unspeakable harm) rather than any actual harm in only a few days.

If anyone did like it I doubt they'd have felt free to speak up.

> I do know people who found it offensive

I find Bros offensive - in my house. But the word? No. "Nazi" is just a word, Nazis are offensive.

> I can see no particular reason not to change the name.

Ditto there's no reason to. Someone went out of their way to take offense which was bounced around an echo chamber and now everyone is offended by something they didn't know existed before today.

That's not something we should reward.

return0 · on March 22, 2016

Interesting. Is it also middle-out?

ifdefdebug · on March 22, 2016

While that is true, I have another question: is it really necessary to pull that topic off here again?

sremani · on March 22, 2016

it was not necessary, only to satiate my curiosity.

13thLetter · on March 23, 2016

"Deemed" by whom?

jakubstr · on March 22, 2016

Nice. Hope to see it in FF soon. And in Internet Explorer 87.

TheRealPomax · on March 22, 2016

http://caniuse.com/#search=brotli tells us FF already supports this, and IE has it marked as being under implementation consideration (which one would expect to turn into active development once they finish their current WOFF2 support work; WOFF2 has a hard dependency on a working Brotli implementation)

KurvaKde1334 · on March 22, 2016

Try nginx & FF45 -> doesnt work as FF uses deprecated (and now obsolete & removed) API

bzbarsky · on March 22, 2016

Reference for this, please? Because I see no bugs filed on FF regarding brotli apart from https://bugzilla.mozilla.org/show_bug.cgi?id=1222541 and https://bugzilla.mozilla.org/show_bug.cgi?id=1207234 neither of which seems to match your problem description. I'm happy to get a bug filed for you so whatever problem this is can be fixed, but I need a bit more to go on here...

I did look around for known brotli+Firefox+nginx issues, but the only one that comes up is https://bugzilla.mozilla.org/show_bug.cgi?id=1215724 which was fixed in shipping Firefox months ago, so I'm assuming that's not the one you're talking about.

KurvaKde1334 · on March 22, 2016

Sure, there is no official bug for this. The only way to see it is to check the dates when the new API was applied - its far too late after releasing versions already supporting brotli, but this commit was not backported anywhere so in a real life, its broken in FF. Just get an nginx, nginx brotli module from google and use FF 45. Good luck :)

https://hg.mozilla.org/releases/mozilla-aurora/log/10e1774de...

bzbarsky · on March 22, 2016

> but this commit was not backported anywhere

Which commit are you talking about, exactly? Bug 1242904 was backported to Firefox 45, and everything else I see at http://hg.mozilla.org/mozilla-central/filelog/tip/modules/br... as of today (which is the same as <http://hg.mozilla.org/mozilla-central/filelog/ea6298e1b4f7/m...) was checked in way before 45....

bzbarsky · on March 22, 2016

Ah, looks like the part of bug 1242904 that was backported is just the security fix, not the entire library version update and the version update is in 46. So what you could be seeing is something like https://bugzilla.mozilla.org/show_bug.cgi?id=1254411

Of course the version update wasn't supposed to change the on-the-wire behavior... or so the library authors claimed. :(

ndesaulniers · on March 22, 2016

FF had it before Chrome!

https://hacks.mozilla.org/2015/11/better-than-gzip-compressi...

KurvaKde1334 · on March 22, 2016

In 2020 maybe :)

wyldfire · on March 22, 2016

I would hope that content type negotiation should mean that we could capitalize on it where it's supported.

_optl · on March 22, 2016

If anyone wants to check out a Rust implementation:

https://github.com/ende76/brotli-rs

It's currently in use in Servo

pmlnr · on March 22, 2016

Are there any memory & CPU consumption graphs?

I also don't understand why not xz, though I know that requires significantly more resource than gzip.

TheCondor · on March 22, 2016

Brotli actually has levels of compression that are more dense than gzip and compress/decompress faster. There is also a spec, xz has a reference implementation but no spec.

nacs · on March 22, 2016

> more dense than gzip and compress/decompress faster

If you're saying brotli compresses/decompresses faster that gzip, it doesn't according to Cloudflare tests [1] (see table near bottom).

Even the fastest compression level of Brotli is slower than the highest/slowest compression level of gzip in most cases.

[1]: https://blog.cloudflare.com/results-experimenting-brotli/

cornstalks · on March 22, 2016

> If you're saying brotli compresses/decompresses faster that gzip, it doesn't according to Cloudflare tests.

It doesn't always compress faster, but as far as I can tell they didn't measure or say anything about decompression.

Nelson69 · on March 22, 2016

I just did this:

time bro --quality 6 --input linux-4.5.tar --output linux-4.5.tar.bro

real 0m18.436s user 0m18.228s sys 0m0.184s

time gzip -9 linux-4.5.tar real 0m30.555s user 0m30.424s sys 0m0.172s

ls -lh (I removed the metadata): 106M linux-4.5.tar.bro 129M linux-4.5.tar.gz

Only 82% of the size at 60% of the time. Now this is pure text so that's not a good example of everything.

sp332 · on March 22, 2016

If it takes a lot of CPU, it might not be worth the CPU time or cost. https://i.imgur.com/bAc1Saq.png Image from https://dl.acm.org/citation.cfm?id=1084786&dl=ACM&coll=DL&CF... And gzip speed has improved since 2005, so over a 10 Mbps connection, you'd have to get much better compression to be worth the switch.

doomrobo · on March 22, 2016

>XZ is a file format, LZMA is the compression algorithm.

https://bugzilla.mozilla.org/show_bug.cgi?id=366559#c18

cromwellian · on March 22, 2016

I think decompression speed is more important. I'm willing to burn compression time precompressing resources. For more resources, especially JS and images, you can achieve precompression. Or, if you compress on the fly, you can cache the result.

superiphone77 · on March 22, 2016

Heh, Brotli + http2, it looks so Fast: http://www.http2demo.io

Joky · on March 22, 2016

So since this is a "dictionary" based algorithm, what about specializing the dictionary per content (content/type=javascript would have a different dictionary than html for instance). Also to embed Brotli for my custom application format, what should I expect by specializing the dictionary for my use case?

GigabyteCoin · on March 22, 2016

Title should be changed to "...25% improvement over Gzip"

re: "Brotli should bring 25% reduction in data size compared to Gzip for the most common assets like Javascript and CSS files. For HTML, Brotli promises up to 40% difference (with median around 25%)."

KurvaKde1334 · on March 22, 2016

Official comparison by google

https://cran.r-project.org/web/packages/brotli/vignettes/bro...

charlesju · on March 22, 2016

Is this a Silicon Valley TV show joke or is this real?

Not sarcasm, real question.

detaro · on March 22, 2016

What made you wonder about it being a joke?

shawabawa3 · on March 22, 2016

It's real

SureshG · on March 22, 2016

Do we have any implementation for Brotli encoder/decoder on JVM or JNI/JNA is the best option available right now for using Brotli on jvm apps?