Author here. The benchmark part could be clearer; I acknowledge that. Interestin...

Matthias247 · on Sept 25, 2023

If you would measure pure latency between a single request (HTTP, RPC, whatever), the latency difference between any async or non async implementation should be microseconds at most and never milliseconds. If its more, then something with the implementation is off. And as you mentioned threads might even be faster, because there is no need to switch between threads (like in a multithreaded async runtime) or are there needs for additional syscalls (just read, not epoll_wait plus read).

async runtimes can just perform better at scale or reduce the amount of resources at scale. Where "at scale" means a concurrency level of >= 10k in the last benchmarks I did on this.

zamalek · on Sept 25, 2023

Concurrent is rarely faster than parallel, across almost any language that supports it. If you know that you don't need obscene scalability (1000 connections is pushing the edge of what's reasonable with parallelism) then stick with parallelism. If you overuse parallelism then expect your entire system (OS and all) to grind to a halt through context switching.

monocasa · on Sept 25, 2023

You'd be surprised. 10k threads is more than manageable on Linux.

tomjakubowski · on Sept 25, 2023

paging Dan Kegel…

monocasa · on Sept 25, 2023

Lol, fair enough, but the C10k problem these days does have a "just use OS threads" solution. It wasn't free; it took a lot of work across the industry. Computers have gotten both faster both single core and wider number of cores. And kernels have spent the last couple decades really working hard on their schedulers and kernel/user sync primitives to handle those high thread counts.

The native model falls apart under C10M, but to be fair so does traditional epoll/queue/iocp dispatching coroutines model of solving C10K. That's where you start having to keep the network stack and application data plane colocated in the same context as much as possible. That can be done with something like DPDK to keep those both in user space, or Netflix is known for their FreeBSD work making KTLS and sendfile kiss in order to keep the data plane completely in the kernel.