I've always wondered if better multi-core performance can come from processing d...

bmurphy1976 · on Dec 12, 2023

This definitely happens. This is how videos uploaded to Facebook or YouTube become available so quickly. The video is split into chunks based on key frame, the chunks are farmed out to a cluster of servers and encoded in parallel, and the outputs are then re-assembled into the final file.

cudder · on Dec 12, 2023

I know next to nothing about video encoders, and in my naive mind I absolutely thought that parallelism would work just like you suggested it should. It sounds absolutely wild to me that they're splitting single frames into multiple segments. Merging work from different threads for every single frame sounds wasteful somehow. But I guess it works, if that's how everybody does it. TIL!

astrange · on Dec 12, 2023

Most people concerned about encoding performance are doing livestreaming and so they can't accept any additional latency. Splitting a frame into independent segments (called "slices") doesn't add latency / can even reduce it, and it recovers from data corruption a bit better, so that's usually done at the cost of some compression efficiency.

account42 · on Dec 13, 2023

> Most people concerned about encoding performance are doing livestreaming

What make you think that? I very much care about encoding performance (for a fixed quality level) for offline use.

Hello71 · on Dec 12, 2023

your idea also doesn't work with live streaming, and may also not work with inter-frame filters (depending on implementation). nonetheless, this exists already with those limitations: av1an and I believe vapoursynth work more or less the way you describe, except you don't actually need to load every chunk into memory, only the current frames. as I understand, this isn't a major priority for mainstream encoding pipelines because gop/chunk threading isn't massively better than intra-frame threading.

kevincox · on Dec 12, 2023

It can work with live streaming, you just need to add N keyframes of latency. With low-latency livestreaming keyframes are often close together anyways so adding say 4s of latency to get 4x encoding speed may be a good tradeoff.

mort96 · on Dec 12, 2023

Well, you don't add 4s of latency for 4x encoding speed though. You add 4s of latency for very marginal quality/efficiency improvement and significant encoder simplification, because the baseline is current frame-parallel encoders, not sequential encoders.

Plus, computers aren't quad cores any more, people with powerful streaming rigs probably have 8 or 16 cores; and key frames aren't every second. Suddenly you're in this hellish world where you have to balance latency, CPU utilization and encoding efficiency. 16 cores at a not-so-great 8 seconds of extra latency means terrible efficiency with a key frame every 0.5 second. 16 cores at good efficiency (say, 4 seconds between key frames) means terrible 64 second of extra latency.

imtringued · on Dec 12, 2023

You can pry vp8 out of my cold dead heands. I'm sorry, but if it takes more than 200ms including network latency it is too slow and video encoding is extremely CPU intensive so exploding your cloud bill is easy.

bagels · on Dec 12, 2023

4s of latency is not acceptable for applications like live chat

kevincox · on Dec 12, 2023

As I said, "may be". "Live" varies hugely with different use cases. Sporting events are often broadcast live with 10s of seconds of latency. But yes, if you are talking to a chat in real-time a few seconds can make a huge difference.

dbrueck · on Dec 12, 2023

Actually, not only does it work with live streaming, it's not an uncommon approach in a number of live streaming implementations*. To be clear, I'm not talking about low latency stuff like interactive chat, but e.g. live sports.

It's one of several reasons why live streams of this type are often 10-30 seconds behind live.

* Of course it also depends on where in the pipeline they hook in - some take the feed directly, in which case every frame is essentially a key frame.

kevincox · on Dec 12, 2023

> except you don't actually need to load every chunk into memory, only the current frames.

That's a good point. In the general case of reading from a pipe you need to buffer it somewhere. But for file-based inputs the buffering concerns aren't relevant, just the working memory.

seeknotfind · on Dec 12, 2023

Video codecs often encode the delta from the previous frame, and because this delta is often small, it's efficient to do it this way. If each thread needed to process the frame separately, you would need to make significant changes to the codec, and I hypothesize it would cause the video stream to be bigger in size.

keehun · on Dec 12, 2023

The parent comment referred to "keyframes" instead of just "frames". Keyframes—unlike normal frames—encode the full image. That is done in case the "delta" you mentioned could be dropped in a stream ending up with strange artifacts in the resulting video output. Keyframes are where the codec gets to press "reset".

account42 · on Dec 13, 2023

> That is done in case the "delta" you mentioned could be dropped in a stream ending up with strange artifacts in the resulting video output.

Also to be able to seek anywhere in the steam without decoding all previous frames.

seeknotfind · on Dec 12, 2023

Oh right. For non realtime, if you're not IO bound, this is better. Though I'd wonder how portable the codec code itself would be.

actionfromafar · on Dec 12, 2023

The encoder has a lot of freedom in how it arrives at the encoded data.

danielrhodes · on Dec 12, 2023

Isn't that delta partially based on the last keyframe? I guess it would be codec dependent, but my understanding is that keyframes are like a synchronization mechanism where the decoder catches up to where it should be in time.

0x457 · on Dec 12, 2023

Yes, key frames are fully encoded, and some delta frames are based on the previous frame (which could be keyframe or another delta frame). Some delta frames (b-frames) can be based on next frame instead of previous. That's why sometimes you could have a visual glitch and mess up the image until the next key frame.

I'd assume if each thread is working on its own key frame, it would be difficult to make b-frames work? Live content also probably makes it hard.

astrange · on Dec 12, 2023

In most codecs the entropy coder doesn't reset across frames, so there is enough freedom that you can do multithreaded decoding. ffmpeg has frame-based and slice-based threading for this.

It also has a lossless codec ffv1 where the entropy coder doesn't reset, so it truly can't be multithreaded.

rokweom · on Dec 12, 2023

There's already software that does this: https://github.com/master-of-zen/Av1an Encoding this way should indeed improve quality slightly. Whether that is actually noticeable/measurable... I'm not sure.

jamal-kumar · on Dec 12, 2023

I've messed around with av1an. Keep in mind the software used for scene chunking, L-SMASH, is only documented in Japanese [1], but it does the trick pretty well as long as you're not messing with huge dimensions like HD VR where you have video dimensions that do stuff like crash quicktime on a mac

[1] http://l-smash.github.io/l-smash/

rnnr · on Dec 12, 2023

ffmpeg and x265 allow you to do this too. frame-threads=1 will use 1 thread per frame addressing the issue OP mentioned, without big perf penalty, in contrary to 'pools' switch which sets the threads to be used for encoding.

PatronBernard · on Dec 12, 2023

IIUC - International Islamic University Chittagong?

nolist_policy · on Dec 12, 2023

IIUC - If I understand correctly.

KineticLensman · on Dec 12, 2023

If I Understand Correctly