Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There was a talk about this at 34c4 earlier today.

Demystifying Network Cards

- https://events.ccc.de/congress/2017/Fahrplan/events/9159.htm...

- https://streaming.media.ccc.de/34c3/relive/9159




Thanks for posting these links. On page 20 of the slide deck the author enumerates problems with existing user mode frameworks and one bullet point states:

• Limited support for interrupts

• Interrupts not considered useful at >= 0.1 Mpps

Does anyone have any insight into why Interrupts aren't considers "useful" at this rate? Is this a reference to NAPI and the threshold at which its better to poll?


The core of the problem is an optimization problem between saving power and optimizing for latency.

Lets say you are receiving around 100k packets. You can fire 100k interrupts per second. Sure, no problem. But you'll probably be running at 100% CPU load. NAPI doesn't really help here. You'll see a quite high CPU load with NAPI (if measuring it correctly, CONFIG_IRQ_TIME_ACCOUNTING is often disabled), just not the horrible pre-NAPI live locks.

What will help is a driver that limits the interrupt rate (Intel drivers do this by default, the option is called ITR). Now you've got a high latency instead (for some definitions of high, you'll see 40-100 µs with ixgbe).

Note that the interrupts are now basically just a hardware timer. We don't need to use a NIC for hardware timers.

This is of course only true at quite high packet rates (0.1 Mpps was maybe the wrong figure, let's say 1 Mpps). Interrupts are great at low packet rates.

I've looked into this in some more detail in the Linux kernel a few years ago, see this paper for lots of details: https://www.net.in.tum.de/fileadmin/bibtex/publications/pape...

All the DPDK stuff about power saving is also quite interesting. Power saving is a relevant topic and DPDK is still quite bad at it. I think the most promising approach is dynamically adding and removing worker threads as well as controlling the CPU frequency from the application (that knows how loaded it actually is!) is currently the most promising approach. Unfortunately, most DPDK allocate threads and cores statically at the moment.


Thank you for the detailed explanation. Those bullets points make sense to me now. Cheers.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: