Video codec in 100 lines of Rust

Lerc · on Dec 19, 2022

Naive Image and video codecs are quite fun to make, I have done a bunch of them over the years and it's quite easy to get within cooee of established formats. And even surpass them under certain conditions. I made a lossy image format that achieves 20-25 PSNR at around 200:1 compression, which is better than most lossy formats because that's in a quality/data-size that most image formats consider out of scope.

QOI https://qoiformat.org/ is a good example of a practically useful simple format.

It's still quite a leap to get to the best new codecs, suddenly you are in a world of head hurty math.

It's also worth noting that it is easy to beat the old formats all-round with off-the-shelf parts, both JPEG and PNG can be bested by changing the outer level of compression for something that was invented after the formats were made. For instance using LZMA or zstd as the final stage. Quite often that's enough to put them on a par with more radically different newer formats.

bob1029 · on Dec 19, 2022

> It's also worth noting that it is easy to beat the old formats all-round with off-the-shelf parts, both JPEG and PNG can be bested by changing the outer level of compression for something that was invented after the formats were made.

The heart of JPEG is in The DCT and its energy compaction properties.

One crazy thing about the DCT is that it doesn't just let you make trade-offs for high/low frequency features. It also lets you make tradeoffs for horizontal and vertical features. If you customize your quantization matrix to your specific application, you can potentially achieve compression ratios far exceeding anything available today - even if you leave in the crusty old RLE+Huffman coding.

If you want to get up to your elbows in this sort of thing, there is an entire book on it by Rao & Yip that is about as comprehensive as it gets - https://www.abebooks.com/products/isbn/9780125802031/3119886...

deaddodo · on Dec 20, 2022

This is the primary reason that custom DCTs and a whole slew of predefined DCTs are included in the standard. The common 8x8 default works in most cases, but you can optimize heavily for edge case images (in which case, baking in the DCT is made up for by the decreased data size).

skywal_l · on Dec 20, 2022

The Computerphile channel has a nice series of videos on JPEG and one focusing on DCT [0]. It's a nice and easy intro to the subject.

[0]: https://youtu.be/Q2aEzeMDHMA

userbinator · on Dec 19, 2022

For instance using LZMA or zstd as the final stage. Quite often that's enough to put them on a par with more radically different newer formats.

If you do that with a video codec, you lose the ability to seek within the stream, which makes it useless for streaming video. For images, the amount of memory required to decompress may become excessive.

It's similar to why zip (deflate) is still widely used, but is far from optimal in compression efficiency; everything that does better (in some cases much better) is going to be slower, bigger, or both. See https://en.wikipedia.org/wiki/PAQ for an example of extreme lossless compression.

adgjlsfhk1 · on Dec 20, 2022

you could probably do something really good using zstd dictionary compression where you compress each frame separately but have them share a dictionary

traverseda · on Dec 20, 2022

I use zstd in my btrfs filesystem as a transparent file compression. I can still seek inside files.

heavyset_go · on Dec 20, 2022

btrfs does compression at the block level, whereas video compression would do it at the stream level.

Files themselves are not compressed on btrfs, it's the blocks that get compressed.

formerly_proven · on Dec 20, 2022

Video streams are generally compressed in blocks called group of pictures (GOP) as well. That's how seeking in a video stream works - skip to the beginning of the GOP the desired timestamp is in, start decoding from there and show images once the timestamp is reached.

traverseda · on Dec 20, 2022

Well that was kind of my point, why assume that the above poster would go for the most naive possible implementation when thinking about it for more than a minute yields several obvious solutions, and there are known implementations that have already solved that problem?

ZSTD also has some fun "dictionary" operations so that even if you're chunking your data you can still take advantage of cross-chunk redundancy by "training" across all your chunks before the compression stage.

repple · on Dec 19, 2022

within cooee : within hailing distance : not unapproachable

andrewflnr · on Dec 20, 2022

I did guess something like that eventually, but given the context my first thought was "let's see, COefficient Of, uh, hmmmm..." :D

rasz · on Dec 20, 2022

>achieves 20-25 PSNR at around 200:1 compression, which is better than most lossy formats because that's in a quality/data-size that most image formats consider out of scope.

according to https://cloudinary.com/blog/contemplating-codec-comparisons#... google leaned hard on AVIF ability to produce small garbage in its comparison against jpeg xl

WalterBright · on Dec 20, 2022

[flagged]

_eojb · on Dec 20, 2022

Filing a patent "for fun" and then posting about with details about the patent contents on a technical forum should be outlawed.

WalterBright · on Dec 20, 2022

I don't have any financial interest in it, if that's what bothers you.

_eojb · on Dec 20, 2022

Many of us, myself included, do not want to read about patent encumbered IP in general. It doesn't matter who owns the patent, or how you purport to wield it on this particular day, in this particular year. You are essentially doing the equivalent of traversing the internet, planting mines as you go.

WalterBright · on Dec 20, 2022

I clearly identified it up front as a patent, and if you don't want to read about it, just skip it.

_eojb · on Dec 20, 2022

That's like saying "movie spoiler ahead" and then writing it out in cleartext. C'mon man...

pcwalton · on Dec 19, 2022

This is really an image codec, isn't it? Since it doesn't have any temporal compression capabilities.

It's interesting to see how well such a simple technique performs. I wonder what would happen if you added trivial temporal compression by simply subtracting the color values of the previous frame from the next and encoding the residual. How would that perform?

sjsdaiuasgdia · on Dec 19, 2022

Why would temporal compression be a necessary requirement to be called a video codec?

Quite a few codecs in the "intra-frame only" section of this Wikipedia list, and that section is within the "Video compression formats" section:

https://en.wikipedia.org/wiki/List_of_codecs#Intra-frame-onl...

userbinator · on Dec 19, 2022

Because it is trivial to turn any image codec into a video codec by simply encoding each frame individually, and despite the article talking about temporal redundancy, doesn't actually attempt to show any code that deals with that.

randyrand · on Dec 20, 2022

mjpeg is a popular video codec where each frame is jpeg compressed

erichocean · on Dec 20, 2022

Nitpick: mjpeg is a video compression format, not a codec; the codec is plain old JPEG (which is not a temporal codec).

IshKebab · on Dec 19, 2022

It's a bit debatable but he definitely only did the image coding part of the video codec. All of those listed formats also support the metadata required for video.

I was certainly expecting some motion coding.

a-dub · on Dec 19, 2022

some early implementations of mpeg-1 compressors only supported I frames. amusingly, this is still a valid mpeg-1 bitstream.

lmm · on Dec 20, 2022

Not particularly strange, a lot of compression formats work like that. E.g. you can make a zip file at STORE level and there will be no actual compression.

Am4TIfIsER0ppos · on Dec 20, 2022

You can do the same with a modern encoder too by setting the keyframe interval to 1 and "amusingly" the bitstream is still valid.

vkaku · on Dec 19, 2022

I-P-B.

Some video formats only go I. Then there's not a lot different between images and video, as far as editing goes. Decoding for end user transportation has a lot more going on, but one has to start somewhere.

Anyway - I think that this kind of work is a great starter and gets more people interested in this.

pornel · on Dec 19, 2022

A simple delta between frames wouldn't perform well if there was any camera movement: you'd pay for every edge twice.

Instead of working with a delta, conditionally using previous frame as prediction source could work (e.g. if pixel A was closer to previous frame's A than to current frame's B, predict from previous frame's X). Or you could signal prediction source explicitly per block or with RLE. Ideally you'd do motion compensation, but doing that precisely enough for a lossless compressor is more than 100 lines.

quickthrower2 · on Dec 20, 2022

What about a "don't bother" bit for when that happens?

kuschku · on Dec 19, 2022

While delivery formats often use P- and B-frames, editing and recording formats often go all-intra. e.g., the Sony FS7 only supports all-intra XAVC-I for recording at full resolution and framerate.

Personally, I use ProRes 422 for recording, and DNxHD/DNxHR for proxies (and that's only because DaVinci Resolve's free edition can't create ProRes Proxies).

Both of these codecs are all-intra formats in mpeg containers.

randyrand · on Dec 20, 2022

the only requirement is to support video, and support compression

see: mjpeg

chrismorgan · on Dec 20, 2022

Video codecs need not support compression.

dr-ando · on Dec 19, 2022

Funnily enough I recently released 0.1.0 of "less-avc" a pure Rust H.264 (AVC) video encoder: https://github.com/strawlab/less-avc/ . For now it only implements a lossless I PCM encoder but supports a few features I need such as high bit depth. If anyone has a codec-writing itch they want to scratch, I would welcome work towards the compression algorithms H.264 supports: context-adaptive variable-length coding (CAVLC) and context-adaptive binary arithmetic coding (CABAC). Also I'm happy for constructive criticism or questions on this library. I think it is fairly idiomatic, and no `unsafe`, rust. While H.264 is an older codec now, as far as I can tell, this also means any patents on it are about to run out and it is very widely supported.

userbinator · on Dec 19, 2022

as far as I can tell, this also means any patents on it are about to run out

Not for H.264; looks like the last patent expires in 2028:

https://scratchpad.fandom.com/wiki/MPEG_patent_lists#H.264_p...

On the other hand, the last patent on MPEG-4 ASP (Xvid/DivX/etc.) which preceded H.264 apparently just expired earlier this month:

https://meta.wikimedia.org/wiki/Have_the_patents_for_MPEG-4_...

...and IANAL but that means the patents for H.263 and everything older should've already expired too.

dr-ando · on Dec 20, 2022

That's a great list of the H.264 patent claims--thanks. I had naively assumed that since the first iteration of standard was published in 2003 that "obviously" all related patents (to features in the first iteration, anyway) would have to have been filed prior. Clearly, that is not the case.

Dylan16807 · on Dec 20, 2022

"method of selecting a reference picture" sounds like an encoding patent, and that one was filed four years after the standard came out. I wouldn't worry about 2028.

It's harder to evaluate the blob of patents from 2004-2005.

kevmo314 · on Dec 19, 2022

I discovered this cool guide while looking for more resources for my own codec from scratch project: https://github.com/kevmo314/codec-from-scratch

jackosdev · on Dec 19, 2022

Do you know anywhere to find good decoders in pure Rust for common codecs like H.264 and H.265? Great tutorial by the way learnt a lot

pornel · on Dec 20, 2022

I haven’t seen any implementations yet, and given their patent licensing situation, they’re probably not first in line for a rewrite in Rust.

There’s rav1e for AV1 encoding.

cillian64 · on Dec 20, 2022

Decoders also have the difficulty that you need to support most of the format’s features before they can support content found in the wild. Also you often need to add hacks to support encoders which technically violate the specification but are commonly used.

In contrast, you can build a very simple encoder using very few of the format features and still have it be usable/useful (albeit with poor quality/compression ratio).

dr-ando · on Dec 20, 2022

Not exactly what you are asking for, but jcodec is a pretty readable codebase written in Java. (The readability part is often, ahh, lacking in the source for codecs, in my experience.) It might be a good candidate for rewriting in Rust. https://github.com/jcodec/jcodec

DragonStrength · on Dec 20, 2022

Are there many codecs of any sort with Rust implementations? The majority of Rust stuff I see linked are thin wrappers around existing C or C++ libraries.

josephg · on Dec 20, 2022

Not many.

Weirdly, chatgpt can be remarkably good at translating code between programming languages.

I suspect within a year or two it'll be pretty easy to translate a lot of C libraries to native rust code (or whatever) using modern AIs.

adamnemecek · on Dec 19, 2022

(2021)