Had a quick glance and while the content seems worthwhile I am a little saddened by the code.
There are numerous calls to strcpy() which is a buffer overflow waiting to happen. I know this is not production code but it's just as important. This code is to be used as a learning tool so I think it's important to use correct idioms.
Apologies for nitpicking, but code gets copied, often in a hurry and you get bugs all over the place.
One more comment: in the intro you mention that while the code is mostly tested on Linux, it's likely it works on other Unixes such as Free/OpenBSD. You might want to remind the reader that epoll() is Linux-only (bonus points if you mention the BSDs have something similar called kqueue() :)).
strcpy() is actually fine when used with literal/constant strings and the buffer is large enough. The compiler can even optimize it and issue warnings in case of overflow.
However, using strcpy() with arbitrary strings, is a bad idea, even using the safer variants (strncpy, strlcpy, ...) is not ideal. In the case of ZeroHTTPd, the length of the path should be checked during request parsing and a 414 error be returned in case it is too long. Once you know the maximum size of your path, you should be able to size your buffers to do all your manipulation safely, and it includes using strcpy/strcat. I wouldn't recommend it but it can be safe.
Done right, string manipulation in C can be very efficient, it is also very tricky, because you need to be aware of your buffer sizes at all times. Ideally, you should avoid copies.
All the string manipulation trickery may be a little too much for such a simple project and it isn't the point, so I suggest maybe just add a few comments along the lines as "this is unsafe". strncpy() is not the right way to do it, it is just a band-aid so that instead of buffer overflows, you get truncated data, which is usually a less severe bug.
Since we are nitpicking, the return values of system calls like recv() are not properly checked. EINTR and EAGAIN have to be taken into account. It it rarely a problem except when the server is overloaded. And considering that the point of ZeroHTTPd is to measure performance, being robust to such conditions matter.
BTW, flooding your server is very educative. I wrote a small, performance-oriented HTTP server myself (with epoll()) and that's at 100% CPU with all queues full that you realize that you really need to read the man pages. The Linux network stack is pretty stable under load but you need to do your part.
Finally, thank you for your work, I had a feeling that epoll() was "the right way" but I didn't test it. You did ;)
OpenBSD and FreeBSD do have sendfile(), but args are different compared to the interface presented by Linux. It should be trivial to write a wrapper, though:
It was a static content focused poll-based (might have had epoll too) web server written in non-STL C++ which dramatically outperformed multi-threaded Apache. I remember it had a "stat cache" to reduce the number of syscalls made, and a nice set of string classes for passing around substrings of e.g. HTTP headers.
I don't think the epoll/kqueue version ever got an official release :(
The syscall reduction was pretty extreme, even down to keeping a shared-memory copy of time() to share across processes. IIRC that was only a benefit for HP-UX/IRIX or something like that...
that looks interesting; it would make the project clearer to others to add some title, date, institution or other mark to find project backing, at the top of the doc, and the same with details in the endnotes.
Another project in the performance oriented Linux HTTP servers is https://lwan.ws/ (using pervasive zero copy), but as far as I know it does not cover a lot of features yet.
(in reality it's barely even that e.g. it only handles GET and POST methods, discards every header, …, so it's an HTTP server in the sense that it kinda sorta will respond to HTTP requests).
So there is no keep alive in these tests including lb to application? Makes sense why the qps is so low on a single core for all versions tested if nginx is having to reopening a new socket each time to zerohttpd. Not sure how useful this is as keeping your connection alive to your lb is important for throughput.
Not in all use-cases. If your backend is serving long-lived HTTP streams (big downloads; chunked SSE streams; websocket sessions), it may make more sense to close and re-open those sockets between sessions, since they live long enough to establish TCP window characteristics that may not apply to the session succeeding them (e.g. an interactive-RPC websocket session, reusing a TCP connection previously used to stream a GB of data using huge packets, will start off quite a bit slower for its use-case than a “fresh” TCP session would.)
Keep-alive is a win in most situations, but especially so in any kind of benchmarking, since it's very easy to hit bottlenecks in opening/accepting new connections, e.g. running out of ports. If you are opening and closing thousands of connections, port space / TCP tuning can become the limiting factor, regardless of your server architecture.
As mentioned in the first part of the series, the main idea is to compare and contrast Linux server architectures. ZeroHTTPd doesn't implement the HTTP protocol in full and you can easily crash it because of the way it uses memory buffers. It is not safe to run it on the internet.
Its purpose is not to show how to implement an HTTP server, but to show how different architectures of Linux network servers are written and perform.
So does Apache with mpm_event, which has been around for close to 15 years now.
The fact that you can run Apache in a process pool model doesn't mean you should. That mode was mostly kept to support CGI scripts and old style mod_php.
Not really, with pages loading hundreds of assets (especially when relying on http2 instead of asset packing/image sprites) your single-request 99.5th percentile will become the floor for your mean page load time.
Heh, you beat me to publishing my own simple HTTP server. Though the one I'm working on is even more dumbed down, only supporting GET to serve prepared HTTP messages from a hash table (this requires a creative interpretation of RFC 7230 to avoid including the Date header which would have to be updated each time).
I very much approve of this tutorial, I remember when long time ago I tried to understand how a web server works, and learned that I need to install Apache, then put stuff in CGI directory, or maybe just use a framework and not have a separate server at all... It was quite confusing for me back then. But the core functionality of a HTTP server is just listen on a port, accept connections, read text, write text.
For anyone interested, I also recommend to read "Unix Network Programming" books; the first part is about actual network programming, the second about inter-process communication on a single computer. For example, the old art of using Unix-domain sockets for TCP or UDP on a single computer (harder to hack by shady javascript in your browser!) seems unjustly forgotten.
Awesome comparison, super curious how this would scale out on the current generation of big CPU machines. E.g. Epyc with 64 cores, would threads still perform that well?
Would be lovely to get my hands on such metal :)
I wonder how Linux thread scheduling scales on multi-CPU machines. To keep things simple, I specifically chose to go with a single-core machine to benchmark all architectures.
Unfortunately, the current POSIX AIO implementation is done in user space by glibc. That's the reason why I covered poll and epoll. The next logical variant to add would be io_uring.
Although I’ve noticed that it’s written in C. I would suggest educational materials to be written in Rust, unless the topic is very low-level optimization.
That way you can be sure that even if some novice blindly copies your example code, it won’t cause any security issues, thus saving you from the liability :)
(Glances at the "Programming Rust" book which has been sitting on his desk for months and thinks about writing an indemnification clause in the LICENSE file)
I've never really done any serious work on anything other than Linux. But I always wanted to try out kqueue(). Even if I don't add a separate section on kqueue(), I think it warrants a clear mention.
There are numerous calls to strcpy() which is a buffer overflow waiting to happen. I know this is not production code but it's just as important. This code is to be used as a learning tool so I think it's important to use correct idioms.
Apologies for nitpicking, but code gets copied, often in a hurry and you get bugs all over the place.
One more comment: in the intro you mention that while the code is mostly tested on Linux, it's likely it works on other Unixes such as Free/OpenBSD. You might want to remind the reader that epoll() is Linux-only (bonus points if you mention the BSDs have something similar called kqueue() :)).