Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Video codec in 100 lines of Rust (tempus-ex.com)
306 points by kevmo314 on Dec 19, 2022 | hide | past | favorite | 45 comments


Naive Image and video codecs are quite fun to make, I have done a bunch of them over the years and it's quite easy to get within cooee of established formats. And even surpass them under certain conditions. I made a lossy image format that achieves 20-25 PSNR at around 200:1 compression, which is better than most lossy formats because that's in a quality/data-size that most image formats consider out of scope.

QOI https://qoiformat.org/ is a good example of a practically useful simple format.

It's still quite a leap to get to the best new codecs, suddenly you are in a world of head hurty math.

It's also worth noting that it is easy to beat the old formats all-round with off-the-shelf parts, both JPEG and PNG can be bested by changing the outer level of compression for something that was invented after the formats were made. For instance using LZMA or zstd as the final stage. Quite often that's enough to put them on a par with more radically different newer formats.


> It's also worth noting that it is easy to beat the old formats all-round with off-the-shelf parts, both JPEG and PNG can be bested by changing the outer level of compression for something that was invented after the formats were made.

The heart of JPEG is in The DCT and its energy compaction properties.

One crazy thing about the DCT is that it doesn't just let you make trade-offs for high/low frequency features. It also lets you make tradeoffs for horizontal and vertical features. If you customize your quantization matrix to your specific application, you can potentially achieve compression ratios far exceeding anything available today - even if you leave in the crusty old RLE+Huffman coding.

If you want to get up to your elbows in this sort of thing, there is an entire book on it by Rao & Yip that is about as comprehensive as it gets - https://www.abebooks.com/products/isbn/9780125802031/3119886...


This is the primary reason that custom DCTs and a whole slew of predefined DCTs are included in the standard. The common 8x8 default works in most cases, but you can optimize heavily for edge case images (in which case, baking in the DCT is made up for by the decreased data size).


The Computerphile channel has a nice series of videos on JPEG and one focusing on DCT [0]. It's a nice and easy intro to the subject.

[0]: https://youtu.be/Q2aEzeMDHMA


For instance using LZMA or zstd as the final stage. Quite often that's enough to put them on a par with more radically different newer formats.

If you do that with a video codec, you lose the ability to seek within the stream, which makes it useless for streaming video. For images, the amount of memory required to decompress may become excessive.

It's similar to why zip (deflate) is still widely used, but is far from optimal in compression efficiency; everything that does better (in some cases much better) is going to be slower, bigger, or both. See https://en.wikipedia.org/wiki/PAQ for an example of extreme lossless compression.


you could probably do something really good using zstd dictionary compression where you compress each frame separately but have them share a dictionary


I use zstd in my btrfs filesystem as a transparent file compression. I can still seek inside files.


btrfs does compression at the block level, whereas video compression would do it at the stream level.

Files themselves are not compressed on btrfs, it's the blocks that get compressed.


Video streams are generally compressed in blocks called group of pictures (GOP) as well. That's how seeking in a video stream works - skip to the beginning of the GOP the desired timestamp is in, start decoding from there and show images once the timestamp is reached.


Well that was kind of my point, why assume that the above poster would go for the most naive possible implementation when thinking about it for more than a minute yields several obvious solutions, and there are known implementations that have already solved that problem?

ZSTD also has some fun "dictionary" operations so that even if you're chunking your data you can still take advantage of cross-chunk redundancy by "training" across all your chunks before the compression stage.


within cooee : within hailing distance : not unapproachable


I did guess something like that eventually, but given the context my first thought was "let's see, COefficient Of, uh, hmmmm..." :D


>achieves 20-25 PSNR at around 200:1 compression, which is better than most lossy formats because that's in a quality/data-size that most image formats consider out of scope.

according to https://cloudinary.com/blog/contemplating-codec-comparisons#... google leaned hard on AVIF ability to produce small garbage in its comparison against jpeg xl


[flagged]


Filing a patent "for fun" and then posting about with details about the patent contents on a technical forum should be outlawed.


I don't have any financial interest in it, if that's what bothers you.


Many of us, myself included, do not want to read about patent encumbered IP in general. It doesn't matter who owns the patent, or how you purport to wield it on this particular day, in this particular year. You are essentially doing the equivalent of traversing the internet, planting mines as you go.


I clearly identified it up front as a patent, and if you don't want to read about it, just skip it.


That's like saying "movie spoiler ahead" and then writing it out in cleartext. C'mon man...


This is really an image codec, isn't it? Since it doesn't have any temporal compression capabilities.

It's interesting to see how well such a simple technique performs. I wonder what would happen if you added trivial temporal compression by simply subtracting the color values of the previous frame from the next and encoding the residual. How would that perform?


Why would temporal compression be a necessary requirement to be called a video codec?

Quite a few codecs in the "intra-frame only" section of this Wikipedia list, and that section is within the "Video compression formats" section:

https://en.wikipedia.org/wiki/List_of_codecs#Intra-frame-onl...


Because it is trivial to turn any image codec into a video codec by simply encoding each frame individually, and despite the article talking about temporal redundancy, doesn't actually attempt to show any code that deals with that.


mjpeg is a popular video codec where each frame is jpeg compressed


Nitpick: mjpeg is a video compression format, not a codec; the codec is plain old JPEG (which is not a temporal codec).


It's a bit debatable but he definitely only did the image coding part of the video codec. All of those listed formats also support the metadata required for video.

I was certainly expecting some motion coding.


some early implementations of mpeg-1 compressors only supported I frames. amusingly, this is still a valid mpeg-1 bitstream.


Not particularly strange, a lot of compression formats work like that. E.g. you can make a zip file at STORE level and there will be no actual compression.


You can do the same with a modern encoder too by setting the keyframe interval to 1 and "amusingly" the bitstream is still valid.


I-P-B.

Some video formats only go I. Then there's not a lot different between images and video, as far as editing goes. Decoding for end user transportation has a lot more going on, but one has to start somewhere.

Anyway - I think that this kind of work is a great starter and gets more people interested in this.


A simple delta between frames wouldn't perform well if there was any camera movement: you'd pay for every edge twice.

Instead of working with a delta, conditionally using previous frame as prediction source could work (e.g. if pixel A was closer to previous frame's A than to current frame's B, predict from previous frame's X). Or you could signal prediction source explicitly per block or with RLE. Ideally you'd do motion compensation, but doing that precisely enough for a lossless compressor is more than 100 lines.


What about a "don't bother" bit for when that happens?


While delivery formats often use P- and B-frames, editing and recording formats often go all-intra. e.g., the Sony FS7 only supports all-intra XAVC-I for recording at full resolution and framerate.

Personally, I use ProRes 422 for recording, and DNxHD/DNxHR for proxies (and that's only because DaVinci Resolve's free edition can't create ProRes Proxies).

Both of these codecs are all-intra formats in mpeg containers.


the only requirement is to support video, and support compression

see: mjpeg


Video codecs need not support compression.


Funnily enough I recently released 0.1.0 of "less-avc" a pure Rust H.264 (AVC) video encoder: https://github.com/strawlab/less-avc/ . For now it only implements a lossless I PCM encoder but supports a few features I need such as high bit depth. If anyone has a codec-writing itch they want to scratch, I would welcome work towards the compression algorithms H.264 supports: context-adaptive variable-length coding (CAVLC) and context-adaptive binary arithmetic coding (CABAC). Also I'm happy for constructive criticism or questions on this library. I think it is fairly idiomatic, and no `unsafe`, rust. While H.264 is an older codec now, as far as I can tell, this also means any patents on it are about to run out and it is very widely supported.


as far as I can tell, this also means any patents on it are about to run out

Not for H.264; looks like the last patent expires in 2028:

https://scratchpad.fandom.com/wiki/MPEG_patent_lists#H.264_p...

On the other hand, the last patent on MPEG-4 ASP (Xvid/DivX/etc.) which preceded H.264 apparently just expired earlier this month:

https://meta.wikimedia.org/wiki/Have_the_patents_for_MPEG-4_...

...and IANAL but that means the patents for H.263 and everything older should've already expired too.


That's a great list of the H.264 patent claims--thanks. I had naively assumed that since the first iteration of standard was published in 2003 that "obviously" all related patents (to features in the first iteration, anyway) would have to have been filed prior. Clearly, that is not the case.


"method of selecting a reference picture" sounds like an encoding patent, and that one was filed four years after the standard came out. I wouldn't worry about 2028.

It's harder to evaluate the blob of patents from 2004-2005.


I discovered this cool guide while looking for more resources for my own codec from scratch project: https://github.com/kevmo314/codec-from-scratch


Do you know anywhere to find good decoders in pure Rust for common codecs like H.264 and H.265? Great tutorial by the way learnt a lot


I haven’t seen any implementations yet, and given their patent licensing situation, they’re probably not first in line for a rewrite in Rust.

There’s rav1e for AV1 encoding.


Decoders also have the difficulty that you need to support most of the format’s features before they can support content found in the wild. Also you often need to add hacks to support encoders which technically violate the specification but are commonly used.

In contrast, you can build a very simple encoder using very few of the format features and still have it be usable/useful (albeit with poor quality/compression ratio).


Not exactly what you are asking for, but jcodec is a pretty readable codebase written in Java. (The readability part is often, ahh, lacking in the source for codecs, in my experience.) It might be a good candidate for rewriting in Rust. https://github.com/jcodec/jcodec


Are there many codecs of any sort with Rust implementations? The majority of Rust stuff I see linked are thin wrappers around existing C or C++ libraries.


Not many.

Weirdly, chatgpt can be remarkably good at translating code between programming languages.

I suspect within a year or two it'll be pretty easy to translate a lot of C libraries to native rust code (or whatever) using modern AIs.


(2021)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: