Hacker News new | past | comments | ask | show | jobs | submit login

Hmm, we're getting <200 ms glass-to-glass latency by streaming H.264/MP4 video over a WebSocket/TLS/TCP to MSE in the browser (no WebRTC involved). Of course browser support for this is not universal.

The trick, which maybe you don't want to do in production, is to mux the video on a per-client basis. Every wss-server gets the same H.264 elementary stream with occasional IDRs, the process links with libavformat (or knows how to produce an MP4 frame for an H.264 NAL), and each client receives essentially the same sequence of H.264 NALs but in a MP4 container made just for it, with (very occasional) skipped frames so the server can limit the client-side buffer.

When the client joins, the server starts sending the video starting with the next IDR. The client runs a JavaScript function on a timer that occasionally reports its sourceBuffer duration back to the server via the same WebSocket. If the server is unhappy that the client-side buffer remains too long (e.g. minimum sourceBuffer duration remains over 150 ms for an extended period of time, and we haven't skipped any frames in a while), it just doesn't write the last frame before the IDR into the MP4 and, from an MP4 timestamping perspective, it's like that frame never happened and nothing is missing. At 60 fps and only doing it occasionally this is not easily noticeable, and each frame skip reduces the buffer by about 17 ms. We do the same for the Opus audio (without worrying about IDRs).

In our experience, you can use this to reliably trim the client-side buffer to <70 ms if that's where you want to fall on the latency-vs.-stall tradeoff curve, and the CPU overhead of muxing on a per-client basis is in the noise, but obviously not something today's CDNs will do for you by default. Maybe it's even possible to skip the per-client muxing and just surgically omit the MP4 frame before an IDR (which would lead to a timestamp glitch, but maybe that's ok?), but we haven't tried this. You also want to make sure to go through the (undocumented) hoops to put Chrome's MP4 demuxer in "low delay mode": see https://source.chromium.org/chromium/chromium/src/+/main:med... and https://source.chromium.org/chromium/chromium/src/+/main:med...

We're using the WebSocket technique "in production" at https://puffer.stanford.edu, but without the frame skipping since there we're trying to keep the client's buffer closer to 15 seconds. We've only used the frame-skipping and per-client MP4 muxing in more limited settings (https://taps.stanford.edu/stagecast/, https://stagecast.stanford.edu/) but it worked great when we did. Happy to talk more if anybody is interested.

[If you want lower than 150 ms, I think you're looking at WebRTC/Zoom/FaceTime/other UDP-based techniques (e.g., https://snr.stanford.edu/salsify/), but realistically you start to bump up against capture and display latencies. From a UVC webcam, I don't think we've been able to get an image to the host faster than ~50 ms from start-of-exposure, even capturing at 120 fps with a short exposure time.]




Why even bother with the mp4? For audio sync, or just to use <video> tags?

On the web i was latency down by just sending nalus, and decoding the h264 with a wasm build of broadway, but now with webcodecs (despite some quirks), thats even simpler (and possibly faster too, but depends on encoding with b-frames etc) Of course trying to get lowest latency video, I'm not paying attention to sound atm :)


This is really interesting. Have you published this approach somewhere? It'd be nice to read more about it.


Thanks! The basic video-over-WebSocket technique was part of our paper here: https://puffer.stanford.edu/static/puffer/documents/puffer-p...

Talk here: https://www.youtube.com/watch?v=63aECX2MZvY&feature=youtu.be

Code here: https://github.com/StanfordSNR/puffer

The "per-client muxing with frame skipping" code is something we used for a few months for our Stagecast project to a userbase of ~20, but not really "in prod": https://github.com/stanford-stagecast/audio/blob/main/src/fr...

Client-side JS here: https://github.com/stanford-stagecast/audio/blob/main/src/we...


Aha, you worked on Salsify too!

Dropped the last frame before an IDR is a very clever hack to sync things up.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: