Hacker News new | past | comments | ask | show | jobs | submit login

The system that was tested there was PCIe bandwidth constrained because this was a few years ago. With your system, it'll get a bigger number - probably 14 or 15 million 4KiB IO per second per core.

But while SPDK does have an fio plug-in, unfortunately you won't see numbers like that with fio. There's way too much overhead in the tool itself. We can't get beyond 3 to 4 million with that. We rolled our own benchmarking tool in SPDK so we can actually measure the software we produce.

Since the core is CPU bound, 512B IO are going to net the same IO per second as 4k. The software overhead in SPDK is fixed per IO, regardless of size. You can also run more threads with SPDK than just one - it has no locks or cross thread communication so it scales linearly with additional threads. You can push systems to 80-100M IO per second if you have disks and bandwidth that can handle it.




Yeah, that’s what I wondered - I’m ok with using multiple cores, would I get even more IOPS when doing smaller I/Os. Is the benchmark suite you used part of the SPDK toolkit (and easy enough to run?)


Whether you get more IOPs with smaller I/Os depends on a number of things. Most drives these days are natively 4KiB blocks and are emulating 512B sectors for backward compatibility. This emulation means that 512B writes are often quite slow - probably slower than writing 4KiB (with 4KiB alignment). But 512B reads are typically very fast. On Optane drives this may not be true because the media works entirely differently - those may be able to do native 512B writes. Talk to the device vendor to get the real answer.

For at least reads, if you don't hit a CPU limit you'll get 8x more IOPS with 512B than you will with 4KiB with SPDK. It's more or less perfect scaling. There's some additional hardware overheads in the MMU and PCIe subsystems with 512B because you're sending more messages for the same bandwidth, but my experience has been that it is mostly negligible.

The benchmark builds to build/examples/perf and you can just run it with -h to get the help output. Random 4KiB reads at 32 QD to all available NVMe devices (all devices unbound from the kernel and rebound to vfio-pci) for 60 seconds would be something like:

perf -q 32 -o 4096 -w randread -t 60

You can specify only test specific devices with the -r parameter (by BUS:DEVICE:FUNCTION essentially). The tool can also benchmark kernel devices. Using -R will turn on io_uring (otherwise it uses libaio), and you simply list the block devices on the command line after the base options like this:

perf -q 32 -o 4096 -w randread -t 60 -R /dev/nvme0n1

You can get ahold of help from the SPDK community at https://spdk.io/community. There will be lots of people willing to help.

Excellent post by the way. I really enjoyed it.


Thanks! Will add this to TODO list too.


Yah this has been going on for a while. Before SPDK it was with custom kernel bypasses and fast inifiband/FC arrays. I was involved with a similar project in the early 2000's. Where at the time the bottleneck was the shared xeon bus, and then it moved to the PCIe bus with opterons/nehalem+. In our case we ended up spending a lot of time tuning the application to avoid cross socket communication as well since that could become a big deal (of course after careful card placement).

But SPDK has a problem you don't have with bypasses and uio_ring, in that it needs the IOMMU enabled, and that can itself become a bottleneck. There are also issues for some applications that want to use interrupts rather than poll everything.

Whats really nice about uio_ring is that it sort of standardizes a large part of what people were doing with bypasses.


FYI SPDK doesn't strictly require the IOMMU be enabled. See https://spdk.io/doc/system_configuration.html There's also a new experimental interrupt mode (not for everything) finding some valuable use cases in SPDK, see https://github.com/spdk/spdk/blob/master/CHANGELOG.md and feel free to jump on the SPDK slack channel or email list for more info on either of these https://spdk.io/community/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: