Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe that's standard for Netflix, etc, but is it also true for plain webms and mp4s in a <video> tags? I thought those were downloaded in one request but had enough metadata at the beginning to allow playback to start before the file is completely downloaded.


Yes it is true.

Browsers talking to static web servers use HTTP byte ranges requests to get chunks of videos and can use the same mechanism to seek to any point in the file.

Streaming that way is fast and simple. No fancy technology required.

For MP4 to work that we you need to render it as fragmented MP4.


Why would the browser send byte range requests for video tags if it expects to play the file back linearly from beginning to end anyway? Wouldn't that be additional overhead/round-trips?


> Why would the browser send byte range requests for video tags if it expects to play the file back linearly from beginning to end anyway?

Probably because byte range is required for seeking, and playing from the beginning is equivalent to seeking at 0.

> Wouldn't that be additional overhead/round-trips?

No because the range of the initial byte range request is the whole file (`bytes=0-`).


My original comment was about the commenter I replied to saying:

> To my knowledge, video stream requests chunks by range and is largely client controlled. It isn't a single, long lived http connection.

Wouldn't a byte range request for the whole file fall under the "single, long lived http connection"? Sure it could be terminated early and another request made for seeking, but regardless the video can start before the whole file is downloaded, assuming it's encoded correctly?


> Wouldn't a byte range request for the whole file fall under the "single, long lived http connection"?

Yes, it would (though a better description would be "a single, long lived http request" because this doesn't have anything to do with connections), and wewewedxfgdf also replied Yes.

> Sure it could be terminated early and another request made for seeking, but regardless the video can start before the whole file is downloaded, assuming it's encoded correctly?

Yes.


The client doesn't want to eat the whole file, so it uses a range request for just the beginning of the file, and then the next part as needed.


The client would actually request the whole file and then terminate the request if the file is no longer needed. This is what browsers do at least.


Both are possible, and in fact I could imagine not all servers being too happy with having to trickle data over a persistent HTTP connection through the entire length of the video, with an almost always full TCP send buffer at the OS level.


> Both are possible

It is possible if you are in control of the client, but no browser would stream an mp4 file request by request.

> with an almost always full TCP send buffer at the OS level

This shouldn't be a problem because there is flow control. Also the data would probably be sent to the kernel in small chunks, not the whole file at once.


> It is possible if you are in control of the client, but no browser would stream an mp4 file request by request.

I believe most browsers do it like that, these days: https://developer.mozilla.org/en-US/docs/Web/Media/Guides/Au...

> This shouldn't be a problem because there is flow control.

It's leveraging flow control, but as I mentioned this might be less efficient (in terms of server memory usage and concurrent open connections, depending on client buffer size and other variables) than downloading larger chunks and closing the HTTP connection in between them.

Many wireless protocols also prefer large, infrequent bursts of transmissions over a constant trickle.


> I believe most browsers do it like that, these days

Nope. Browsers send a byte range request for the whole file (`0-`), and the correspoding time range grows as the file is being downloaded. If the user decided to seek to a different part of the file, say at byte offset 10_000, the browser would send a second byte range request, this time `10000-` and a second time range would be created (if this part of the file has not already been downloaded). So there is no evidence there that any browser would stream files in small chunks, request by request.

> in terms of server memory usage

It's not less efficient in terms of memory usage because the server wouldn't read more data from the filesystem than it can send with respect to the flow control.

> concurrent open connections

Maybe if you're on HTTP/1, but we live in the age of HTTP/2-3.

> Many wireless protocols also prefer large, infrequent bursts of transmissions over a constant trickle.

AFAIK browsers don't throttle download speed, if that's what you mean.


Ah, interesting, I must have mixed it up/looked at range request based HLS playlists in the past. Thank you!

> AFAIK browsers don't throttle download speed, if that's what you mean.

Yeah, I suppose by implementing a relatively large client-application-side buffer and reading from that in larger chunks rather than as small as the media codec allows, the same outcome can be achieved.

Reading e.g. one MP3 frame at a time from the TCP buffer would effectively throttle the download, limited only by Nagle's Algorithm, but that's probably still much too small to be efficient for radios that prefer to sleep most of the time and then receive large bursts of data.


Realistically you wouldn’t be reading anything from the TCP buffer because you would have TLS between your app and TCP, and it’s pretty much guaranteed that whatever TLS you’re using already does buffering.


That's effectively just another small application layer buffer though, isn't it? It might shift what would otherwise be in the TCP receive buffer to the application layer on the receiving end, but that should be about all the impact.


Oh you’re right, I’m just so used to making the TLS argument because there is also the cost of syscalls if you make small reads without buffering, sorry xD


Are you sure browsers would try to download an entire, say, 10h video file instead of just some chunks of it?


Common sense tells me there should be some kind of limit, but I don't know what it is, whether it's standardized and whether it exists. I just tested and Firefox _buffered_ (according to the time range) the first 27_000~ seconds, but in the dev tools the request appeared as though still loading. Chrome downloaded the first 10.2 MB (according to dev tools) and stopped (but meanwhile the time range was growing from zero approximately by one second every second, even though the browser already stopped downloading). After it played for a bit, Chrome downloaded 2.6 more MB _using the same request_. In both cases the browser requested the whole file, but not necessarily downloaded the whole file.


Seconded, ive done a userland 'content-range' implementation myself. of course there were a few ffmpeg specific parameters the mp4 needed to work right still


It’s not true because throwing a video file as a source on video tag has no information about the file being requested until the headers are pushed down. Hell back in 2005 Akamai didn’t even support byte range headers for partial content delivery, which made resuming videos impossible, I believe they pushed out the update across their network in 06 or 07.


If your HTTP server provides and supports the appropriate headers and you’re serving supported file types, then it absolutely is true.

Just putting a url in my Chromium based browser’s address bar to an mp4 file we have hosted on CloudFlare R2 “just works” (I expect a video tag would be the same), supporting skipping ahead in the video without having to download the whole thing.

Initially skipping ahead didn’t work until I disabled caching on CloudFlare CDN as that breaks the “accept-range” capability on videos. For now we have negligible amount of viewership of these mp4s, but if it becomes an issue we’ll use CloudFlare’s video serving product.


> If your HTTP server provides and supports the appropriate headers and you’re serving supported file types, then it absolutely is true.

No. When you play a file in the browser with a video tag. It requests the file. It doesn’t ask for a range. It does use the range if you seek it, or you write the JavaScript to fetch based on a range. That’s why if you press play and pause it buffers the whole video. Only if you write the code yourself can you partially buffer a while like YouTube does.


Nah, it uses complex video specific logic and http range requests as protocol. (At least the normal browsers and servers. You can roll your own dumb client/server of course.)

> That’s why if you press play and pause it buffers the whole video.

Browsers don't do that.


Obviously it doesn’t initially ask for a range if it starts from the beginning of the video, but it starts playing video immediately without requiring the whole file to download, when you seek it cancels the current request and then does a range request. At no point does it “have” to cache the entire file.

I suppose if you watch it from start to finish without seeking it might cache the entire file, but it may alternatively keep a limited amount cached of the video and if you go back to an earlier time it may need to re-request that part.

Your confidence seems very high on something which more than one person has corrected you on now, perhaps you need to reassess the current state of video serving, keeping in mind it does require HTTP servers to allow range requests.


You can learn it here:

https://www.zeng.dev/post/2023-http-range-and-play-mp4-in-br...

You can also watch it happen - the Chrome developer tools network tab will show you the traffic that goes to and from the web browser to the server and you can see this process in action.


Who cares what happened in 2005? This is so rare nowadays, I've only really seen it on websites that are constructing the file as they go, such as the Github zip download feature.


2005 is basically the dark ages of the web. It’s pre Ajax and ie6 was the dominant browser. Using this as an argument is like saying apps aren’t suitable because the iPhone didn’t have an App Store until 2008.

> It’s not true because throwing a video file as a source on video tag has no information about the file being requested until the headers are pushed down.

And yet, if you stick a web server in front of a video and load it in chrome, you’ll see just that happening.


Can load a video into a video tag in chrome. Press play and pause. See it makes a single request and buffers the whole video.


If you stick:

  <video controls>
    <source src="/video/sample.mp4" type="video/mp4">
    Your browser does not support the video tag.
  </video>
into a html file, and run it against this pastebin [0], you'll see that chrome (and safari) both do range requests out of the box if the fileis big enough.

[0] https://pastebin.com/MyUfiwYE


Tried it on a 800mb file. Single request.


I tried it on 4 different files, and in each case my browser sent a request, my server responded with a 206 and it grabbed chunks as it went.


They can playback as loading as long as they are encoded correctly fwiw (faststart encoded).

When you create a video from a device the header is actually at the end of the file. Understandable, it’s where the file pointer was and mp4 allows this so your recording device writes it at the end. You must re-encoded with faststart (puts the moov atom at the start) to make it load reasonably on a webpage though.


> Understandable, it’s where the file pointer was and mp4 allows this so your recording device writes it at the end.

Yet formats like WAVE which use a similar "chunked" encoding they just use a fixed length header and use a single seek() to get back to it when finalizing the file. Quicktime and WAVE were released around nearly the same time in the early 90s.

MP2 was so much better I cringe every time I have to deal with MP4 in some context.


At the expense of quite some overhead though, right?

MPEG-2 transport streams seem more optimized for a broadcast context, with their small frame structure and everything – as far as I know, framing overhead is at least 2%, and is arguably not needed when delivered over a reliable unicast pipe such as TCP.

Still, being able to essentially chop a single, progressively written MPEG TS file into various chunks via HTTP range requests or very simple file copy operations without having to do more than count bytes, and with self-synchronization if things go wrong, is undoubtedly nicer to work with than MP4 objects. I suppose that's why HLS started out with transport streams and only gained fMP4 support later on.


> and is arguably not needed when delivered over a reliable unicast pipe such as TCP.

So much content ended up being delivered this way, but there was a brief moment where we thought multicast UDP would be much more prevalent than it ended up being. In that context it's perfect.

> why HLS started out with transport streams and only gained fMP4 support later on.

Which I actually think was the motivation to add fMP4 to base MP4 in the first place. In any case I think MPEG also did a better job with DASH technically but borked it all up with patents. They were really stupid with that in the early 2010s.


Multicast UDP is widely used - but not on the Internet.

We often forget there are networks other than the Internet. Understandable, since the Internet is most open. The Internet is just an overlay network over ISPs' private networks.

SCTP is used in cellphone networks and the interface between them and legacy POTS networks. And multicast UDP is used to stream TV and/or radio throughout a network or building. If you have a "cable TV" box that plugs into your fiber internet connection, it's probably receiving multicast UDP. The TV/internet company has end-to-end control of this network, so they use QoS to make sure these packets never get dropped. There was a write-up posted on Hacker News once about someone at a hotel discovering a multicast UDP stream of the elevator music.


> If you have a "cable TV" box that plugs into your fiber internet connection, it's probably receiving multicast UDP.

That's a good point: I suppose it's a big advantage being able to serve the same, unmodified MPEG transport stream from a CDN, as IP multicast over DOCSIS/GPON, and as DVB-C (although I’m not sure that works like that, as DVB usually has multiple programs per transponder/transport stream).


The long answer is "it depends on how you do it" unsurprisingly video and voice/audio are probably the most different ways that you can "choose" to do distribution


This. You can't just throw it into a folder and have to stream. The web server has to support it and then there is encoding and formats.


Yea this works for mp4 and HN seems confused about how.

The MOOV atom is how range requests are enabled, but the browser has to find it first. That's why it looks like it's going to download the whole file at first. It doesn't know the offset. Once it reads it, the request will be cancelled and targeted range requests will begin.


The two are essentially the same thing, modulo trading off some unnecessary buffering on both sides of the TCP pipe in the "one big download" streaming model for more TCP connection establishments in the "range request to refill the buffer" one.


For MP4s the metadata is at the end annoyingly enough.


MP4 allows the header at the start or the end.

It’s usually written to the end since it’s its not a fixed size and it’s a pain for recording and processing tools to rewrite the whole file on completion just to move the header to the start. You should always re-encode to move the header to the start for web though.

It’s something you see too much of online once you know about it but mp4 can absolutely have the header at the start.


You can `-movflags faststart` when encoding to place it at the beginning.


implementations may request the metadata range at the end in this case, if the content length is known


For "VOD", that works (and is how very simple <video> tag based players sometimes still do it), but for live streaming, it wouldn't – hence the need for fragmented MP4, MPEG-DASH, HLS etc.

It does work for simpler codecs/containers though: Shoutcast/Icecast web radio streams are essentially just endless MP3 downloads, optionally with some non-MP3 metadata interspersed at known intervals.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: