Naive Image and video codecs are quite fun to make, I have done a bunch of them over the years and it's quite easy to get within cooee of established formats. And even surpass them under certain conditions. I made a lossy image format that achieves 20-25 PSNR at around 200:1 compression, which is better than most lossy formats because that's in a quality/data-size that most image formats consider out of scope.
It's still quite a leap to get to the best new codecs, suddenly you are in a world of head hurty math.
It's also worth noting that it is easy to beat the old formats all-round with off-the-shelf parts, both JPEG and PNG can be bested by changing the outer level of compression for something that was invented after the formats were made. For instance using LZMA or zstd as the final stage. Quite often that's enough to put them on a par with more radically different newer formats.
> It's also worth noting that it is easy to beat the old formats all-round with off-the-shelf parts, both JPEG and PNG can be bested by changing the outer level of compression for something that was invented after the formats were made.
The heart of JPEG is in The DCT and its energy compaction properties.
One crazy thing about the DCT is that it doesn't just let you make trade-offs for high/low frequency features. It also lets you make tradeoffs for horizontal and vertical features. If you customize your quantization matrix to your specific application, you can potentially achieve compression ratios far exceeding anything available today - even if you leave in the crusty old RLE+Huffman coding.
This is the primary reason that custom DCTs and a whole slew of predefined DCTs are included in the standard. The common 8x8 default works in most cases, but you can optimize heavily for edge case images (in which case, baking in the DCT is made up for by the decreased data size).
For instance using LZMA or zstd as the final stage. Quite often that's enough to put them on a par with more radically different newer formats.
If you do that with a video codec, you lose the ability to seek within the stream, which makes it useless for streaming video. For images, the amount of memory required to decompress may become excessive.
It's similar to why zip (deflate) is still widely used, but is far from optimal in compression efficiency; everything that does better (in some cases much better) is going to be slower, bigger, or both. See https://en.wikipedia.org/wiki/PAQ for an example of extreme lossless compression.
you could probably do something really good using zstd dictionary compression where you compress each frame separately but have them share a dictionary
Video streams are generally compressed in blocks called group of pictures (GOP) as well. That's how seeking in a video stream works - skip to the beginning of the GOP the desired timestamp is in, start decoding from there and show images once the timestamp is reached.
Well that was kind of my point, why assume that the above poster would go for the most naive possible implementation when thinking about it for more than a minute yields several obvious solutions, and there are known implementations that have already solved that problem?
ZSTD also has some fun "dictionary" operations so that even if you're chunking your data you can still take advantage of cross-chunk redundancy by "training" across all your chunks before the compression stage.
>achieves 20-25 PSNR at around 200:1 compression, which is better than most lossy formats because that's in a quality/data-size that most image formats consider out of scope.
Many of us, myself included, do not want to read about patent encumbered IP in general. It doesn't matter who owns the patent, or how you purport to wield it on this particular day, in this particular year. You are essentially doing the equivalent of traversing the internet, planting mines as you go.
This is really an image codec, isn't it? Since it doesn't have any temporal compression capabilities.
It's interesting to see how well such a simple technique performs. I wonder what would happen if you added trivial temporal compression by simply subtracting the color values of the previous frame from the next and encoding the residual. How would that perform?
Because it is trivial to turn any image codec into a video codec by simply encoding each frame individually, and despite the article talking about temporal redundancy, doesn't actually attempt to show any code that deals with that.
It's a bit debatable but he definitely only did the image coding part of the video codec. All of those listed formats also support the metadata required for video.
Not particularly strange, a lot of compression formats work like that. E.g. you can make a zip file at STORE level and there will be no actual compression.
Some video formats only go I. Then there's not a lot different between images and video, as far as editing goes. Decoding for end user transportation has a lot more going on, but one has to start somewhere.
Anyway - I think that this kind of work is a great starter and gets more people interested in this.
A simple delta between frames wouldn't perform well if there was any camera movement: you'd pay for every edge twice.
Instead of working with a delta, conditionally using previous frame as prediction source could work (e.g. if pixel A was closer to previous frame's A than to current frame's B, predict from previous frame's X). Or you could signal prediction source explicitly per block or with RLE. Ideally you'd do motion compensation, but doing that precisely enough for a lossless compressor is more than 100 lines.
While delivery formats often use P- and B-frames, editing and recording formats often go all-intra. e.g., the Sony FS7 only supports all-intra XAVC-I for recording at full resolution and framerate.
Personally, I use ProRes 422 for recording, and DNxHD/DNxHR for proxies (and that's only because DaVinci Resolve's free edition can't create ProRes Proxies).
Both of these codecs are all-intra formats in mpeg containers.
Funnily enough I recently released 0.1.0 of "less-avc" a pure Rust H.264 (AVC) video encoder: https://github.com/strawlab/less-avc/ . For now it only implements a lossless I PCM encoder but supports a few features I need such as high bit depth. If anyone has a codec-writing itch they want to scratch, I would welcome work towards the compression algorithms H.264 supports: context-adaptive variable-length coding (CAVLC) and context-adaptive binary arithmetic coding (CABAC). Also I'm happy for constructive criticism or questions on this library. I think it is fairly idiomatic, and no `unsafe`, rust. While H.264 is an older codec now, as far as I can tell, this also means any patents on it are about to run out and it is very widely supported.
That's a great list of the H.264 patent claims--thanks. I had naively assumed that since the first iteration of standard was published in 2003 that "obviously" all related patents (to features in the first iteration, anyway) would have to have been filed prior. Clearly, that is not the case.
"method of selecting a reference picture" sounds like an encoding patent, and that one was filed four years after the standard came out. I wouldn't worry about 2028.
It's harder to evaluate the blob of patents from 2004-2005.
Decoders also have the difficulty that you need to support most of the format’s features before they can support content found in the wild. Also you often need to add hacks to support encoders which technically violate the specification but are commonly used.
In contrast, you can build a very simple encoder using very few of the format features and still have it be usable/useful (albeit with poor quality/compression ratio).
Not exactly what you are asking for, but jcodec is a pretty readable codebase written in Java. (The readability part is often, ahh, lacking in the source for codecs, in my experience.) It might be a good candidate for rewriting in Rust. https://github.com/jcodec/jcodec
Are there many codecs of any sort with Rust implementations? The majority of Rust stuff I see linked are thin wrappers around existing C or C++ libraries.
QOI https://qoiformat.org/ is a good example of a practically useful simple format.
It's still quite a leap to get to the best new codecs, suddenly you are in a world of head hurty math.
It's also worth noting that it is easy to beat the old formats all-round with off-the-shelf parts, both JPEG and PNG can be bested by changing the outer level of compression for something that was invented after the formats were made. For instance using LZMA or zstd as the final stage. Quite often that's enough to put them on a par with more radically different newer formats.