Being a python programmer I am a bit worried that one (In this case) needs to use java to overcome a scaling problem. From my experiences with twisted, which is on much smaller scale than the article, I have never found twisted to be anykind of bottleneck.
It may be that since twisted(python) can only use one cpu effectively per interpreter(GIT et al) that it got left behind java which can easily use multiple CPUs for threads.
A slightly different architecture might be required then where multiple python processes are used.
So far at Plurk we have struggled to overcome scalability issues - - and none of them are due to Python, even thought we use Python a lot.
Most of the problems we have seen have been related to the database. It was and still is the biggest bottleneck. And all the people I know that drive big sites will confirm this.
This all said, one should evaluate problems and not blindly use Python for everything. Python is a great language, but some other language is a better choice for some problems. And here Java and specially Java's NIO library is a much better choice for doing a server that should handle tons of open connections.
Because of GIL, there is no point of using multiple threads in Python. Scaling must be done through adding more processes. Than an IPC communication starts to be an issue - that's what messaging middleware is for. The sooner you integrate messaging with your project - the better.
Once you have a messaging platform - comet can be done in any technology you prefer, it really doesn't matter. That's because the scaling complexity is handled by the message broker.
This is a nice writeup that gives some good insight into what you're up against if you want to implement Comet today. It's a cool idea, but there's really nothing out there that you can use to do it off the shelf. It's great that he's gone through and actually tested out a bunch of these candidate technologies under load, so the next guy will at least know what not to try.
I have faith that Comet will have its day eventually, but clearly it's not quite time. With the current crop of production quality web servers, handling 100k requests per second (polling) is a solved problem. Keeping 100k simultaneous connections alive (comet) is by no means solved. It'll be cool when it happens though.
Correct. If you're Google, and can devote a team to build and maintain a webserver specifically designed to handle Comet, then you can use Comet.
If you're small and need to use off-the-shelf tech to get your thing up and running quickly, you're still best off going with Polling today.
A quick example would be to look at Thinkature vs. Twiddla. They had a team of two, one of which spent an entire year building a web server from the ground up so that they could use Comet. Twiddla only has me as a developer, and I'd rather spend my time improving the product, so we use Polling.
One year later, Twiddla has a ~300ms lag between when you draw a line and when it appears on a remote screen, whereas Thinkature is out of business. I don't think it's that black and white of a tradeoff, but hopefully you see the thrust of what I'm saying.
One day soon, there will be a mod_comet for Apache, or IIS 9 with Comet support built in. That will be the day it makes sense for small, lean teams to build a business around it.
I spent a year or so building my own comet server. Because I did that, I'm able to finely control every single element of it. I know in absolute detail how it works, what sucks, what rocks, etc.
Obviously you have to decide how important things are to your success, and decide if you should build it yourself, or use some existing code out there. For me, it was a no brainer decision.
mod_comet is missing the point. Apache is the issue when using comet. Apache is what needs fixing/replacing.
If Mibbit was using polling, my bandwidth bill would be through the roof.
If you want to use Comet today, you need to build something custom, and it will probably take a lot of time, but it will pay off as you describe in terms of flexibility.
Im cetainly not, iserve, yaws and mochiweb are all "off the shelf" web servers that can handle comet pretty well. I cant believe erlang has the monopoly on lightweight web servers either.
Those are the available docs for the technologies you mentioned. Yaws has some real documentation, but not for its Comet implementation. The others give you some source code and tell you to have fun.
So yeah, it's right there on the shelf (how 'bout we settle on "perched on the edge of the shelf?") I'm just not smart enough (or motivated enough) to actually use any of it.
or you could not use general purpose web servers for edge cases, mochiweb for example is pretty great at handling comet requests.
axod does mibbit by himself, we(hypernumbers) use comet and are a small team, meebo used comet from the start.
there really isnt much of a barrier with comet, it was actually a hell of a lot easier than the flash sockets setup I implemented before it.
* probably worth mentioning facebook used mochiweb for their chat (off the shelf), I do find it hard to believe the only lightweight webservers around are in erlang
Wouldn't you say that any application thinking of using Comet would by definition be an edge case?
But yeah, I'd disagree that the barrier to Comet is low. The natural thing to compare it to is HTTP Polling, which has no barrier whatsoever beyond knowing about window.setInterval(); (is it correct to end a sentence containing code with a semicolon?:)
Twiddla went from concept to launch in ten hours, largely because I didn't need to spend any time thinking about how to handle communications. The intention was to replace Polling with Comet at some point in the future, but you know what? It just isn't anywhere near as slow or problematic as I was expecting.
Back to my original point, there are a lot of smart people (such as yourself) working on this problem. Before the year is out, I suspect that somebody will have a good, proven, out-of-the-box Comet server that you can simply drop your application onto. That's the day I'm waiting for.
heh the point of all 3 of my comments is that there is a good, proven, out-of-the-box* comet server. mochiweb, I havent tested but I would imagine iserve and yaws handle themselves similarly well.
* depending on your definition of out-of-the-box, mochiweb doesnt actually enforce any "protocol" for handling comet for you, those are application specific and reasonably trivial to code.
* I also forgot about erlycomet, which is built on mochiweb, and is a straight out of the box comet server
Interesting. I'm writing a comet server right now, using Ruby's EventMachine. I might do some tests to see how many connections I can stretch it out to; unfortunately I have the feeling that local connections are not going to quite match up to real live outgoing ones.
If I was doing it For Realz™ I reckon Erlang would be the go, still haven't gotten up to anywhere near that level of proficiency though. 37Signals recently rewrote their push server in Erlang and reported great results.
SPEED :: In defence of Java, it is very quick when done right. See Kilim[1] for examples.
SUPPORT :: On top of the speed Java has a supported hardware stack (Solaris). So if doing 100K connections is important to you, you can find engineers who can diagnose difficult problems on your whole stack.
Well, it's not that meaningless. Comet is basically long polling; those connections aren't really doing much of anything until something turns up on the back end to be pushed down them, at which point the connection is usually closed and then re-requested by the client. Maybe with the occasional ping, I've seen some people do that too.
How much memory and CPU is required to simply hold open the connection and sit there is definitely of interest to people. Of course, I'd like to see real world tests too, but that might be rather difficult to pull off without specially written, extremely efficient code on the test side.
Not knowing much about Java's NIO, how do you handle 'long running calculations'? One of the great things about Erlang is that you can sit there doing some big huge long loop, and it won't block the rest of the system because it's got an internal scheduler. One way to do that kind of thing with C or Java is simply to manually divide your big long calculation into smaller chunks that yield control at regular, brief intervals. Does Java provide anything higher level?
What sort of long calculations? If you're doing massive number crunching, you could just have a queue->thread->callback for it if it's not easy to break up into bite size chunks...
Ok, fair enough. What's neat about Erlang is that you just write your code and don't have to worry about divying it up, or sending stuff of to a queue. I suppose one possible disadvantage of the Erlang way is if the scheduler is divying resources up equally, you could be 'thrashing' between too many "processes", rather than working your way through a FIFO queue that's going to only work on so many things at one time. But at that point you could just write a queue system in Erlang...
Not a python programmer here and I'm only guessing after a cursory look recently. Twisted developers are layering a higher-level "architecture" over the basic system calls they have wrapped. It's just a few calls down there (serial and scatter-gather I/O, multiplexing, etc.) but twisted adds higher level data structures, and design-patterny conceptual models to shield the pythonista from Unix.
Some of that gotta add some fat; which might be OK for the great majority of the people, but the few who need raw performance might end up cutting some bacon off.
Is an actual c++ implementation of the java's nio is strictly better than nginx's one? Is it using epool() and sendfine()? Isn't all those java's abstraction leyers are overhead?
It seems like yet another attempt to put some more air into java's bubble.
It may be that since twisted(python) can only use one cpu effectively per interpreter(GIT et al) that it got left behind java which can easily use multiple CPUs for threads.
A slightly different architecture might be required then where multiple python processes are used.