Hacker News new | past | comments | ask | show | jobs | submit login
QOIR: A fast, simple, lossless image file format (nigeltao.github.io)
93 points by todsacerdoti on Dec 7, 2022 | hide | past | favorite | 29 comments



> For example, if you have a relatively wimpy camera that streams out a 160×120 grayscale image, it might be fun to see how far you can push something like combining:

I'm typically not one for self-promo but I did this with video and you can get 90% compression with exactly those three basic steps. RGB -> YUV420, delta encoding, then basically-upgraded-RLE. You can find the guide here: https://github.com/kevmo314/codec-from-scratch


Nice! The caveat being that this assumes we have little to no shot noise I guess? Since that is where lossy codecs typically beat lossless.


Yep adding noise would mess up the deltas which would reduce the efficiency. This can be mitigated with a quantizer, which I was hoping to write about next.

I suspect it's possible to achieve 99% lossy compression without getting too complex to be understandable in the toy codec with a couple more of these similar techniques.


Thanks! I just forked it. Do you know if Apache-2.0 code can be included into a closed source project?



I've been having a problem recently.. apparently every modern image format has a max of either 16K or 64K width or height. I generated a PNG with stable diffusion that was about 70K pixels tall and 2GB. never figured out how to compress it. jpg, webp and avif wouldn't accept it.


It is also crazy to me that so many image formats have such small size caps. I understand not wanting to use extra space for small images, but I can't think of any reason to have a maximum image size smaller than your maximum hard drive size. You probably don't want to unconditionally store 8 bytes for width and height, but you could pretty easily make it only use 1 byte for small images and for anything big, you won't miss the extra 6 bytes.


It may have to do with limitations of allocating memory chunks on 32-bit machines when these formats were originally designed. The decoder allocates a contiguous buffer that can hold the whole image.


WebP is recent enough that it shouldn't have even been a real concern. 16383×16383 even in 2010 was an absurdly small size. I've had PNGs and JPEGs with larger dimensions for longer than WebP has even existed.


My guess the limits are because, WebP is based on WebM (video)

I agree those limits are too small.


WebP's lossy codec is based on VP8 intraframes; the lossless codec is original. Either mode of which is contained in a RIFF file, which is where the dimension limits happen (RIFF's headers have enough space to represent 2^14-1).

WebM is, strictly speaking, a container format that is a subset of Matroska. It doesn't have any real relation to WebP.


One solution would be to hold multiple images inside of another wrapper container like zip. The client code would have to handle stitching them together.


Not a good idea for lossy codecs. You might get seams.


That would be a new image file format.


OpenEXR (designed for, and used by the VFX industry) supports dimensions up to int32 = 2 billion, but is a float16/float32 pixel format only (It can do int32 for IDs, so technically you could multiplex/combine/bitshift 8-bit values into one int32 if you really wanted).


JPEG should be fine with those dimensions (did you try mozjpeg's cjpeg?), but WebP is contained in a RIFF format and has a limit of 16383 pixels in either of its dimensions. Don't know the technicals of AVIF, but being derived from a video format, it's not going to be designed for large sizes in the first place, and I don't know if it also uses RIFF for the same limits.

PNG and JPEG XL are actually made for, well, still images; JPEG XL especially has no practical limitation on dimensions, it'll take whatever you can throw at it, given you have the hardware to process it. JPEG XL is either lossy or lossless at your option ;)


JPEG's technically limited to 64x64k dimensions per the spec (65500 pixels with libjpeg and libturbojpeg), so if you want arbitrary software to be able to open images in that format, you'd be limited to that...


TIFF will support that size and larger. It can use JPEG or ZLIB compression as standard, or even WEBP or JPEG-XL which are unofficially supported by a small number of TIFF libraries.


jpeg-xl should do it. It was specifically designed to make sure it can support crazy resolutions and number of channels.


Depending on what you want to do withe the compressed result, then you could also.. tile it. Not great, probably some common data structures get duplicated within the tiles, but maybe that gactors out for really large images?

Just an idea.


Interestingly they offer a "lossy" mode, which is something I've been thinking about a lot recently. In reality, the decoding of all image formats (that I'm aware of) is completely lossless (i.e. deterministic), even for jpeg. The encoding is where the lossy decisions are made.

So it should be totally possible to take a lossless format, and ask the question -- how to minimally change the image to produce an image that compresses better? There are png pre-processors, like pngquant, that are somewhat lossy, but I wonder how far this can go -- the heuristics that pngquant uses, for example, are based on image processing heuristics, but it seems like it should be equally possible to use characteristics of the very specific compression systems used in a slightly more generic way.

For LZ-based systems, I wonder if there is some more generic way of approaching this. Running it forwards, and saying "what is the cheapest prediction of the next byte given our table so far" is interesting, but running it backwards is also interesting, to say "what is the byte that will increase our hit rate as we move forward in the image".

I played with this a little bit, trying to get farbfeld + bzip2 to produce lossy image compressions, but bzip2 is too complex to easily work with in this fashion. It seems like png has some possibilities here (to optimize against its filters) and QOI (and QOIR) seem to have good possibilities here because of the simplicity of their heuristics.


QOIR is an extension of QOI, discussed here: https://news.ycombinator.com/item?id=29328750


For me the golden nugget in this piece is the Morris principle: "Code tends to be extended to its level of incomprehensibility."


I'm wondering if anybody has tried adding PNG filters on top of QOI[0]. I have an idea of how to do it, and it should be fairly easy to build on top of a QOI encoder/decoder, but didn't get around to trying it out myself.

Per the QOI spec it already has the "SUB" filter built in, in the form of QOI_OP_DIFF. Which is great, because that leaves exactly four PNG filters instead of five (meaning they only need two bits to store): NONE, UP, AVERAGE and PAETH.

Here's how to add the filters: on a per-line basis, create a transformed line for each filter (so the original pixels, plus UP, AVERAGE and PAETH transformations). Then see how each of the arrays would compress, then pick the best one by treating it as if it was the original pixel input. In a separate array we keep track of which filter we picked for each line.

When we've done this for the entire image, we basically have a plain QOI image but if we decompress it the output would show each line in its new transformation. To recover the original image we just have to reverse these transformations bottom-to-top.

So we just have to append that array where we kept track of the filters we picked for each line, four picks per byte. Since we know the height of the image we don't need to mark the end of this stream. The added size overhead is effectively one byte per four lines. I expect speed overhead would be roughly 4x for encoding (since we brute-force try four transformations), and something very tiny for decoding (since all we need to do in the end is one single in-place pass over all pixels to undo the transformations).

Also, because SUB is covered by QOI_OP_DIFF, this approach would implicitly support mixing each of these per-line filters with SUB, so it would be possible to have SUB+UP, SUB+AVERAGE and SUB+PAETH, and it would be on a per-pixel basis. Might be beneficial in some cases.

[0] https://www.w3.org/TR/PNG-Filters.html

[1] https://qoiformat.org/qoi-specification.pdf


I'm not sure why your whole scanline would need a predictor that is top pixel, left pixel, or average. Shouldn't this be for 16 or 32 pixel max?

The most popular QOI modification has been to take the average of top and left decoded pixels as predictor.

I've tried to put more bits into selecting 4 possible pixels as predictors (for a single pixel), but it doesn't work that well because the previously encoded colors are already in the table. Predictor+diff encoding weren't as successful as one would expect, probably because there pixels are processed one by one. One design principle of QOI is to read input once to encode, so not sure if people would want to try 4x the scanline.


> Predictor+diff encoding weren't as successful as one would expect.

Interesting, I'd love to see more details about your experiments! Are they online somewhere? And can you link the other QOI modification you mentioned?

> I'm not sure why your whole scanline would need a predictor that is top pixel, left pixel, or average.

Well, it's all just a thought experiment at this stage anyway right? My reasoning was that when filters get more sophisticated it can become a hard problem to figure out when to apply which one. So I figured an approach that lets us brute-force try all of them seemed simplest. It worked well enough for PNG to be an OK-ish lossless compression format after all.

Actually, you just reminded me of another reason I wanted to try the scanline approach: it makes it easier to give each filter its own thread, so we can try out all four options in parallel. In that case the bottleneck is the slowest of the four options, so it would still be slower, but hopefully not that much, and faster than a 4x slowdown.

Another reason for the scan-line approach was that it seemed like it would be a way to add them on top of QOI without messing too much with the latter's specs. As mentioned, the output would be a regular QOI image where each line just happens to optionally have one of three transformations applied to it, followed by an array that tells you which transformation was applied to each line.

If I understand correctly, QOI images have an end marker (seven 0x00 bytes followed by a single 0x01 byte), meaning adding such an extra array of bytes should be ignored by properly implemented QOI decoders.

If so, that would mean that implementing a decoder for this filter-approach could be a minimal program you run after running any other QOI decoder that takes the output and undoes the transformations. That would make it trivial to build a decoder on top of other people's hard work to implement an efficient decoder for "regular" QOI images..

Think of it like how tar and gz work together. I suppose this trick wouldn't be limited to scanlines though, I'm sure others have thought of it too

> One design principle of QOI is to read input once to encode, so not sure if people would want to try 4x the scanline.

True, but my reasoning was that this would be for a particular niche: when the decoding speed matters most (which would be minimally impacted by this) but encoding speed can be a little slower if the compression is significantly better. No idea how much better that would be though, if at all.


I did not kept track of every experiment, just saying that if you do make a new opcode in QOI that encode a predictor with a few bits + a few bits of diff, then it's hard to make an improvement at all. It's easier if you remove the 8-bit boundaries constraint.

> And can you link the other QOI modification you mentioned?

https://github.com/nigeltao/qoi2-bikeshed/issues/34 then adopted by virtually all QOI-inspired codecs.


Also extended QOI to make QOIX, a triple codec that can also encode 10-bit and 1 and 2-channels images. So you could say it's 3x the complexity.

https://github.com/AuburnSounds/gamut


There is qoirsk, an improvement version of qoir. Please visit github.com/skandau/qoirsk




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: