Holding the TCP connection open does not tie up all the resources of the request handling thread. That way a large number of inactive connections can stay open.
Thanks.
So to clarify, this is just the same as what lighttpd phrases as a 'select()-/poll()-/epoll() based web server'?
It seems that the main advantage of this is that you have one thread manging many sockets. I am a bit surprised that blocking kthreads would be so much slower relatively. What causes the slowness? Context switching? Additional stack memory usage?
You guessed right. Context switching is expensive, and the default stack size default is in the megabyte range. So if you want to have 10K connections open, it takes about 10GB memory. You can of course shrink the stack size, but you have to measure your programs stack usage before doing that and it is quite cumbersome.
With an event driven architecture you only have to hold the session information per connection in memory which can be as low as 4K, therefore you are able to maintain (depending on the complexity of the protocoll) n*100K connections.
The only drawback that you have to write and think your whole program event driven; you write callbacks for each io and timer operation and the control flow won't be clear if you read the program.
I believe this is one of the advantages of 64 bit architectures. That meg of stack size is just a reservation, it's not actually used, unless your thread really needs it. On a 64 bit machine there is plenty of space to allocate.
Yeah, but why would you waste it for stack space, when you can do an event-driven design, and allocate the resources only when needed?
Anyways, I was talking about C, where the stack is allocated when the thread is created and you cannot resize it later, nor it will grow automatically.
With dynamic languages it is different, but don't expect handling that much connections with them either.
This just isn't a substantial cost any more. With the latest kernels, thread per connection servers are competitive with event driven servers. This was not true a few years ago.
You're discounting the output buffer that you will need on a per thread basis (unless you are serving from cache, in which case you can serve the data directly from the cache memory).
Typically an output buffer should be able to hold the complete production for a client, if you limit it to say 4K you will be unable to process a request in one go.
Erlang threads (processes actually), are very different from OS threads. They are extremely lightweight and allow you to do this kind of event driven networking with threaded code.