How WebSockets work vs polling/long polling/streaming

strictfp · on May 27, 2012

I find it amusing that TCP was designed as and still is a full-duplex protocol for long-lived connections (streams). Then someone decides to use it in HTTP (1.0) for short-lived, half-duplex, message-oriented communication. Not surprisingly, it is not a good fit. Fast-forward a couple of years and everybody is busy de-crippling TCP in HTTP with various hacks trying to delay the HTTP request-response cycle, thus uncovering the streams in the underused underlying protocol. First now, many years later, the mistake is properly corrected and TCP on the www is vindicated through websockets. Funny world.

EternalFury · on May 28, 2012

Wait a little longer and, if we are very lucky, multicast will be "re-discovered" as well. That way, instead of having servers shove data down many sockets to many clients, they will write their payload down 1 multicast socket to all interested clients, with said payload crossing the network as little and as far as necessary but not more.

So goes the wheel of progress...

randall · on May 28, 2012

Multicast is a nightmare over the general internet, from what I hear. It'd need some significant tooling to work properly.

gcr · on May 28, 2012

Sounds like ZeroMQ's "rediscovered" multicast as you describe.

ryah · on May 27, 2012

"not a good fit"

You're speaking of HTTP: the protocol on which a great deal of world's computing infrastructure is being set on top of. The protocol which we are communicating with each other over right now. HTTP on TCP is extremely successful and achieves all it was designed to for and has been extensible to handle much more. There are many basic problems which are abstracted well into stateless request response cycles: file serving, RPC. There was no "mistake" - it was not a bad fit. There are only expanding use-cases involving existing software infrastructure - for which websockets is solution. That's not to say that HTTP is perfect... It could be better in many ways - but it's ridiculous to write off its success and call it a mistake.

strictfp · on May 27, 2012

I am not dismissing HTTP. The mistake I refered to was to limit TCP usage for www by only allowing the request-response oriented HTTP protocol. HTTP is opinionated and its REST-based architecture was explicitly chosen to disallow partial updates of web pages, thusly crippling TCP. But if one wants to build "thick client" type apps the restrictions of HTTP will weigh you down hard. Allowing for unrestricted TCP usage makes a lot more sense in that case, and is exactly what WebSockets delivers.

One can argue that we were better off with the pure REST web before XmlHttpRequest, in fact I am inclined to agree there, but the crippling of TCP was bound to be cracked. The real humor is that the crack is now re-launched as a new great feature, when it was explicitly excluded for the break-down of REST that it causes.

MichaelGG · on May 28, 2012

You say HTTP crippled TCP so as to not break-down REST. HTTP 1.0 does not seem to be designed for REST; it doesn't seem that REST was really published until quite a while later. Calling HTTP "REST-based" seems a bit of a stretch.

On top of that, could you give a few examples of what you actually mean? Like what would a partial update be?

TCP seems like a perfect fit for the underlying transport for a request/response model; I don't see how choosing it is some sort of deliberate "crippling". What should they have done? Built a custom request/response protocol on top of IP? Isn't that just crippling IP's flexibility as a layer 3 protocol? I don't understand your objections.

strictfp · on May 28, 2012

Roy Fielding was involved in HTTP 1.0 and I believe that REST was a generalization of some of the principles used therein. I explained the breakdown in another comment, see http://news.ycombinator.com/item?id=4033717

strictfp · on May 28, 2012

I'm not saying that TCP was a super-bad choice, just that they wanted only a subset of the features and got a bit more than they wanted. Also see http://news.ycombinator.com/item?id=4033822 .

Animus7 · on May 28, 2012

Supply & demand.

Back in the day internet connections were so terrible that it was unbearable to do much more than to download a document and display it. Hence, HTTP and HTML. Arbitrary TCP streams existed but in practice performance and latency were very limiting, so nobody cared that HTTP had no support for streaming.

Eventually the underlying tech matured and the standards followed suit. Slowly. But that's the price of interoperability.

I too find it amusing how these concepts keep getting "rediscovered", but in hindsight it's not surprising.

beaumartinez · on May 28, 2012

So what underlying protocol would you have used instead?

strictfp · on May 28, 2012

Back then TCP was the best mainstream choice. Using UDP would have looked good due to its message orientation. But its limited message size, lack of re-ordering and congestion control would have disqualified it almost directly. TCP then looks better, since it has reliability, re-ordering and rate control. You only need to change the streaming functionality for senfing messages. This they did by using one TCP connection per message. What they missed, however, was that connections are expensive in TCP. There are a lot of handshakes and TCP probes a lot before opening the throttle, something you notice when downloading a big file. Throwing away connections like that is a waste. Ideally, you would want one connection per server and reuse it for multiple requests. This was done in HTTP 2.0, where connection:keep-alive was introduced. But TCP still has significant overhead and a slow rate control mechanism. It also does not allow multiple concurrent requests in the same connection. These problems are addressed by SCTP, which is really a great fit for HTTP and awesome in general. It is also message oriented. SPDY is another similar alternative, which is less general than SCTP.

But the point I was trying to make is that TCP was intentionally crippled by HTTP to acheive REST. Then people hack the restrictions of TCP and declare a new invention.

johnyzee · on May 27, 2012

He seems to play up the problems with comet-style communication quite a bit. Most of his objections, and it seems the entirety of his 'showdown' is based on using the slowest comet implementations, which do repeated HTTP requests, rather than the obvious and, as far as I know, most commonly used streaming approach. Honestly, I see little difference between streaming comet communication and web sockets in terms of performance/overhead.

And why does the "Complexity of comet applications" diagram show "RIA client app" (does this not have to be built when using web sockets?), "Silverlight or Flash plugin" (as if these are necessary for comet), and some convoluted server-side architecture that has nothing to do with the client-server protocol? Again it seems like trying to play up the deficiencies of comet-type apps in a kind of disingenious way.

Web sockets seem to be a great step forward in almost every way (cross-platform support currently missing) so why hype them with imagined performance wins from unrealistic comparisons with other solutions.

mikescar · on May 27, 2012

Yes, and then you get down to the Kaazing sales pitch, where a diagram shows your packets just going out to the Internet, and there's no longer any management to do on the client.

A well written article otherwise though. I appreciated the reminder to audit the headers you're sending out as an easy way to improve performance.

davidpardo · on May 27, 2012

Looks like an advertorial. No mention of socket.io and praise for a commercial solution that I didn't know after almost two years working with websockets.

TazeTSchnitzel · on May 27, 2012

Yup. It is an advertorial. Kaazing like promoting their own "WebSocket Gateway", and they're in a great position to, owning websocket.org

By the way, at the moment I'm using Python's Twisted with a simple wrapper called txWS: https://github.com/MostAwesomeDude/txWS

I'm not a big fan of socket.IO, seems too complex for my needs. I was quite happy to discover txWS, it's literally one more function call and you can just write ordinary Twisted code with no changes, which is a relief.

Edit: I'm tempted to purchase websockets.co...

TazeTSchnitzel · on May 27, 2012

Update: I went and bought websocket.us and websockets.us, and put up my own small website about WebSockets. I just want to provide an alternative to Kaazing's site, that has no commercial focus.

ralph · on May 28, 2012

As a link, http://websocket.us/

majke · on May 28, 2012

Shameless plug: SockJS is non-complex alternative to Socket.io, websockets polyfill.

TazeTSchnitzel · on May 28, 2012

Added to WebSocket.us, thanks!

seldo · on May 27, 2012

While definitely having an agenda, I still found it a useful high-level guide.

lbotos · on May 27, 2012

True but some of the data looks to be "dated". According to Wikipedia, Safari, Chrome FF, and IE10 have WS support.

SkyMarshal · on May 27, 2012

Man I hate these undated Internet articles. Publishers use it to sneakily squeeze out a few extra page views from stale content. Hate it when I fall for that.

Besides support in recent browsers, anything else inaccurate in it?

SkyMarshal · on May 27, 2012

It is in the last paragraph. I thought the 90% of the article prior to that was a pretty good neutral overview of websockets vs prior 2-way alternatives.

Liongadev · on May 27, 2012

Their claim that "HTML5 Web Sockets can provide a 500:1 or—depending on the size of the HTTP headers—even a 1000:1 reduction" is also wrong as they dont account for TCP / IP Frame overhead. Sure it still is around 50:1, and sockets are awesome. But if you present numbers, please dont present misleading numbers.

cagenut · on May 27, 2012

I get the basic idea, but when I try to flush it out into a full cluster and app design long-poll/web-sockets seems like a bit of a complexity nightmare on the back end. I mean you're taking a highly cachable stateless protocol and turning it into an uncachable one. Then you're taking a shared-nothing application layer and forcing it to do shared-state cache coherence via message passing. And all the way up and down the stack you're turning what was overwhelmingly treated as a request/response model and turning it into a different one. Every cache, proxy, application firewall, traffic shaper, loadbalancer and IDS on both sides is going to 'get confused' and net out to a toooon of user complaints and corner cases.

rogerbinns · on May 27, 2012

There is no requirement to use polling or sockets. If it isn't relevant to your users or server side then don't do it.

But the bar now is that users expect their displays to automatically update promptly, and they don't care how hard it is for you to implement. You don't have to do the level of complexity you think. For example there is no need to tell the client there is an update, or the details of the update. All it needs to know is that there could be an update. It can then go off on the regular HTTP connections to see what is new/relevant and that will go via your existing servers/caches/load balancers etc.

strictfp · on May 27, 2012

Exactly. Ajax in general breaks REST, including bookmarking, caching, navigation and more. All major selling points of HTTP goes out the window. HTTP was created on top of TCP, but with severe restrictions to get the great features which HTTP is well-known for. Now someone re-lauches unrestricted TCP as a feature, when it is in fact much more simplistic and formed the base all along. Usage was purpously off limits due to the headaches that unrestricted client-server communication causes.

rogerbinns · on May 28, 2012

> Ajax in general breaks REST

Huh? You are supposed to correctly choose the HTTP verbs and URI path precisely so that HTTP semantics apply.

If you are talking about page state then there are numerous ways of dealing with that - one example is http://diveintohtml5.info/history.html

If your assertion is that developers can write bad code, abuse HTTP semantics, defeat caches, break navigation etc then so what? With the ability to do things right comes the ability to mess them up.

strictfp · on May 28, 2012

I am saying that by using Ajax to partially update a page, that updated page no longer has a URL which identifies it. This breaks HATEOAS badly. The idea of REST is that every state on a site has a representation. REepresentational State Transfer you know... So while there are many other ways of shooting yourself in the foot when designing a REST website, using Ajax makes it almost inevitable.

malandrew · on May 28, 2012

I'm wondering if bookmarking state would have made it into webapps had we not been forced to use HTTP because that is all that was available in the browser for years.

mark_story · on May 28, 2012

I've yet to see an RFC or standard that some developer somewhere didn't totally cock-up. :)

peterlubbers · on June 5, 2012

Just to let you all know that this is an old article (agree that a publish date would have helped there). Frank and I published this article originally in late 2009 or early 2010. That was around the time that WebSocket first landed in Chrome and there have been a lot of updates to the protocol since then. For example, WS was text only, and I don't think socket.io existed yet ;-)

lenkite · on May 28, 2012

"..., it cannot deliver raw binary data to JavaScript, because JavaScript does not support a byte type" Incorrect. JS most definitely supports binary data using byte arrays with Uint8Array, etc or using canvas data binary

halayli · on May 27, 2012

spdy is better designed and supports compression out of the box. I also don't think that a network protocol should be part of HTML5 standard.

TazeTSchnitzel · on May 27, 2012

The WebSocket protocol isn't part of the HTML5 standard. The API is, but WebSocket's protocol is not defined in HTML5.

halayli · on May 28, 2012

The fact that it's mentioned in HTML5 API bothers me.

TazeTSchnitzel · on May 28, 2012

Why? They are Web Sockets, and we have AJAX already.

halayli · on May 28, 2012

Ajax is merely a client JS interface to HTTP. A protocol that already exist. It's not a protocol being imposed on web servers.

When you want to impose a new protocol on web servers and browsers, it better not be to solve push notifications only. And that's what SPDY did.

TazeTSchnitzel · on May 28, 2012

WebSocket isn't just to solve push notifications. It's also for realtime communication, for example chat or multiplayer games. We do have server-sent updates if you just want push notifications.

halayli · on May 28, 2012

yes, without compression. Everything you said is covered in SPDY. But SPDY takes a step further and improves the current state of HTTP as well to better transfer documents.