Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Real-Time Adaptive Image Compression (wave.one)
72 points by cardigan on May 18, 2017 | hide | past | favorite | 48 comments


Nice work, but disingenuous to not include a BPG (HEVC) image for comparison -- BPG is close to state-of-the-art, not WebP -- even their own SSIM charts show this.

Interesting that decoding is slower than encoding. Also curious about performance on CPU.

This approach may also be susceptible to "hallucinating" inaccurate detail; you can see a little bit of this on the upper-right of the girl's circled eyelid compared to the original Kodak image. See also: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...


> This approach may also be susceptible to "hallucinating" inaccurate detail

Yes! This. I can see us givng up decisions to 'AI' without realizing that it's just a loose association machine. Would I want to have my mortgage rate adjusted by a 'loose association'?

https://twitter.com/keff85/status/862690920805916672


As someone interested in the field, this does not look much different from state-of-the-art video codecs, e.g. variable-size blocks, wavelet, predicator, arithmetic coding. Only that the predicator is trained on real data. But symbol dictionaries are already used in modern compressors like brotli and zstd.

Most codecs have been tuned for a mean subjective score (MOS). MS-SSIM is not particularly good to fully rely on [1] [2]. In my experiments it performed poorly.

I think Google team effort will have a much bigger impact [3] just by combining all recent practical improvements in the image compression.

Meanwhile, they could have optimized images on the website a little better. Saved ~15% with my soon-to-be-obsolete tool [4].

[1] https://encode.ru/threads/2738-jpegraw?p=52583&viewfull=1#po...

[2] https://medium.com/netflix-techblog/toward-a-practical-perce...

[3] https://encode.ru/threads/2628-Guetzli-a-new-more-psychovisu...

[4] http://getoptimage.com


From their website:

> Lubomir holds a Ph.D. from UC Berkeley, 20 years of professional experience, 50+ issued patents and 5000+ citations.

I just hope this type of research isn't going to end in a patent encumbrance, like it did with JPEG and MPEG.

These techniques are right around the corner, no matter who invents the file formats.

So if their idea is to lock these general ideas down with more patents, I'd want them to stop their research and let people with more open intentions research this further.


This also looks like a meta-algorithm, an algorithmic way to generate domain specific compressors, potentially anything this thing creates would also be covered by patents.


They are going to capitalize on it, make no mistake.


A rarely discussed danger of all machine learning models: If they don't know the answer, they'll rather make something up.

Here's a Google Translate example: https://twitter.com/keff85/status/862690920805916672

I wouldn't like to lose a part of parcel in a lawsuit because an adaptive algorithm made up some details in aerial photograph so that it compresses better ...


So is this only good for ridiculously low target filesizes? Noone in their right mind is going to compress a "480x480 image to a file size of 2.3kB"

What I want to see is an acceptable looking JPG next to a WaveOne image of the same size. Or an acceptable looking WaveOne next to a JPG of the same size.

How small is good enough? How good is small enough?


The use of a small image is for illustrative purposes. Saying "this image is less bytes than that one" is less striking than "look at those two images: they have the same size, but one is ugly".


I wonder how this compares to FLIF. I also tried to compress images based on shape and structure but by approximating these using skeletons.

I'm just a bit struggling with their performance comparison. The graphs they present are very pretty and promising but for the presented images we're quite left in the dark. They dump some images and theirs looks prettier and the authors give us some indication of quality but it's not conclusive evidence that their method produces better images. Typically when different compressed images are presented two things can vary: quality and file-size. In the presented images both seem to varying without telling us which is which. Also, there is no baseline to compare against, either in terms of filesize or what the should look like. Sure, we humans can make a very educated guess but it is just sloppy to not include the uncompressed original image.

I will be fully convinced when I can try it for myself on my own image set.


FLIF is lossless. So very very different.


Err... FLIF is intended for lossless compression, but can be lossy (it doesn't perform as well as intentionally lossy codecs, although that might also be a matter of optimising for it).

However, it also has a kind of adaptive ML-ish approach so it might be technically similar.

> FLIF is based on MANIAC compression. MANIAC (Meta-Adaptive Near-zero Integer Arithmetic Coding) is an algorithm for entropy coding developed by Jon Sneyers and Pieter Wuille. It is a variant of CABAC (context-adaptive binary arithmetic coding), where instead of using a multi-dimensional array of quantized local image information, the contexts are nodes of decision trees which are dynamically learned at encode time. This means a much more image-specific context model can be used, resulting in better compression.

http://flif.info/


Very impressive work, though it seems like a mistake to focus on compression, which gets less valuable as storage and bandwidth gets cheaper. You need only look to the staying power of jpeg, which is so far from the state of the art, yet it's not going anywhere. Why? The demand for replacing it is not strong enough.

They obviously have some good image priors here, if I were them I would consider applying this tech to other image-related things, like image manipulation, or image search. Although competition is heating up quickly in these fields...


It isn't 'very impressive work', it is marketing for Silicon Valley (impressive marketing though).

Literally the first sentence of the linked page:

"Even though over 70% of internet traffic today is digital media, the way images and video are represented and transmitted has not evolved much in the past 20 years (apart from Pied Piper's Middle-Out algorithm)."

EDIT: This is embarrassing that not one person in this thread seems to have actually read any of the paper. Now with obvious evidence that this is fiction, people still don't want to believe it.



It literally quotes a fictional TV show in the synopsis and directly in the paper. Are you seriously not getting that is is fiction?

Why would they have this in the actual PDF?

"Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery"

This is promotion, they are getting a fake paper to permeate throughout the internet. If I had to guess, I would say they are making a statement about reproducing results in academia and not taking a single paper as gospel. If so, I think they are making their point pretty well.


Okay, it may be. It is something that can be done from a technical level for sure, given enough training data and enough data on the client side to do reconstructions.

I guess once you get popular enough you can get TV consultants who can propose real solutions as TV props. Heh.


It's humor. Everything else looks kinda real. The science is bad though, because it lacks all details needed to reproduce and the evaluation is a bit fishy. They also claim it's super fast without mentioning what makes their approach fast.


No one else seems to realize that.


Deep learning will be a great way to do compression for sure, both for audio, video and images. I could see that one could download "knowledge sets" for these decompressors. Looking at Google Earth, download the supplemental "knowledge set" for overhead shots of cities and country side. Looking at people, download the supplemental "knowledge set" for faces and clothing, etc.

Basically each domain you want to do well in you need a knowledge set that is trained on that data. Then you need a discriminator on the compression side to classify an image or subregions of an image into those categories.

If you can make the knowledge sets downloadable on demand and then cached you can be incredibly efficient over the long term, while maintaining very small initial download sets. I think evolveable knowledge sets over time also ensure that the codex is flexible to handle currently not foreseen situations. Nobody wants a future where are DL-based image/video compression tool only knows a few pre-determined sets and is mediocre on everything else.


The "knowledge set" would be a small matrix - the one that allows to decode whatever encoder has put into compressed data. I guess it will be in volume range of 8x8 16-bit floats or so. Maybe three to six such matrices per channel.


This encoder seems to have some weird distortions that are most visible in the aerial shots. Compare [1] and [2]. The lines on the basketball court are distorted and curved. If you look closely, there is also curvature added to the sidewalks where there isn't any. In case those links break, I'm referring to the top row of aerial images.

[1] https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...

[2] https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...


Yep - and see how it almost disappeared the white car in the shadow in the left middle part. "We've deleted your only alibi so that our data would compress better ..."


Lossy codec being lossy is a feature, not a drawback.


> While we are slightly faster than JPEG (libjpeg) and significantly faster than JPEG 2000, WebP and BPG, our codec runs on a GPU and traditional codecs do not — so we do not show this comparison.

This is great news!

I'd actually like to see the plot, though. (Both for encoding and decoding.) It stands to reason that a neural network can optimize image compression, as it can encode high-level information like "this is a face". But encoding / decoding speed is the sticking point, so I feel successes there should be emphasized.

The necessity of having a GPU doesn't seem problematic nowadays; everything has one. Testing it with a mobile-grade GPU would be interesting.


Thing is, "running on GPU" might mean "uses CUDA" which would make it more problematic


I'm going to need to see this code in practise to believe it.


I didn't find the code. Did you have more luck?


It is a joke paper as a marketing stunt for Silicon Valley. I would bet they could get it accepted to some journals / conferences too since it looks extremely convincing.


Seems like a slightly unfair comparison. Training the compressor moves data from the images into the compressor, making the bit per pixel evaluation slightly more iffy.


Not really.

As long as the decompressor needs just an image file and no other data, it's a fair game.


How large is the decompressor to download?

Is this image compression tool good at images it was not trained on?

How bad does it get in those situations?

Is this training data fixed into the codex forever? Will there be slightly different image codexs that have different training data? That would be sort of hellish.


You'd need some bits in the file telling you which trained decoder you need to get the proper image out.

What would the image of that girl look like if they used WaveOne Aerial?

How is this not cheating if the example image they used to compare algorithms is in the training data?


As long as they don't test on the training set it's a fair comparison isn't it?


One big reason for "hardcoded" encoders and decoders is that they much easier to implement in hardware.

One can improve e.g. H.265 somewhat easily, if software only solution is an option. But if you need cheap hardware only solution then ML-required-way seems a bit too expensive.


Does no one realize this is a joke / marketing?

Directly from the paper's PDF:

"Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery."


Or, you know, that could just be a humorous reference to the TV series, while this is a real implementation.


This is my read too, since they're citing Judge et al versus the characters or the characters' papers. Its the same vein as when Dropbox's Lepton article cited middle-out as a humorous attempt at self-promotion.


So you think they would put a humorous reference to a TV show in the synopsis of their groundbreaking image compression paper and again as an academic reference in their paper?

This whole thread is like being the only sane person in an asylum.

Also, it isn't a 'real implementation' since there isn't any source code to reproduce the results.


I couldn't find that in the PDF?


Are you asking me whether or not you couldn't find it in the PDF?

On the actual web page the first line of its abstract:

"Even though over 70% of internet traffic today is digital media, the way images and video are represented and transmitted has not evolved much in the past 20 years (apart from Pied Piper's Middle-Out algorithm)."

From the PDF:

https://arxiv.org/pdf/1705.05823.pdf

The end of Section 2.2. ML-based lossy image compression Right above 2.3. Generative Adversarial Networks

Theis et al. (2016) and Ball ́ e et al. (2016) quantize rather than binarize, and propose strategies to approximate the entropy of the quantized representation: this provides them with a proxy to penalize it. Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery.


What if the paper was generated using the codec and it is a multilevel joke?


Is this open source or something that you are aiming to license?


Where is PNG?


I think this is lossy compression, so it doesnt really overlap w/PNG in terms of performance/file size goals.


Oh, right. I didn't consider that.


for some reaso they do not show the uncompressed image for comparison


They mentioned the Kodak dataset[1] in the second paragraph. It seems to be Kodak Image 15[2]

edit: as for the other images: it would be indeed nice to see those.

[1] http://r0k.us/graphics/kodak/ [2] http://r0k.us/graphics/kodak/kodim15.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: