IPv4 route lookup on Linux

erikb · on June 26, 2017

We seriously need more articles like this. As someone who would really like to understand more about the kernel source code, but without prior kernel knowledge (no classes in that direction) and barely any C knowledge the kernel is not the easiest part to read oneself into. Articles like this one help a lot, since they already provide an abstract model of what the code does and intends to do and why.

majke · on June 26, 2017

Technical articles get downvoted on HN a lot these days. It became super hard to promote technical stuff here.

Just look at this article - it's successful (on the front page) but it's hovering around 20th place, without any chances of raise: http://hnrankings.info/14631808/

dom0 · on June 26, 2017

> articles get downvoted

You can't downvote articles. (You can flag them, but flagging fine submissions is obviously abusive).

erikb · on June 26, 2017

flagging is basically downvoting

throwawaye · on June 26, 2017

Maybe it should renaming its title to "Rewriting IPv4 route lookup on Linux in VR using Rust and CSS"

betaby · on June 25, 2017

Unfortunately IPv6 stack doesn't match IPv4 one on features and optimizations. IPv6 is still using regular radix tree and a route cache.

vbernat · on June 26, 2017

Yes.

But IPv6 doesn't really have a route cache either. However, it handles exceptions differently than IPv4. With IPv6, exceptions are put in the tree along regular entries (there has been some bugs, like /128 routes disappearing due to a "cached" entry being expired) while IPv4 stores the exceptions with the next-hop information.

EDIT: I stand corrected: before 4.2, route entries were created for almost any destination, so it was essentially a route cache. After that, the entry is created only when it is really needed (MTU lower than the default MTU).

Thaxll · on June 25, 2017

Wasn't it possible to re-implement the same IPv4 model?

bogomipz · on June 26, 2017

Do you have any insight into why this is?

vbernat · on June 26, 2017

Author here. The weaknesses are known (see for example http://www.netdevconf.org/1.1/proceedings/slides/kubecek-ipv...). IPv6 doesn't have the same features as IPv4 (notably, IPv6 can do a route lookup using both source and destination addresses). Therefore, it's not just a matter of reusing what is done with IPv4.

bogomipz · on June 26, 2017

I guess you are referring to scoped addresses then in ipV6? That would makes sense. Thanks for the interesting link.

slackingoff2017 · on June 26, 2017

Not a Linux networking expert... But one thing could be that you can fit entire IPV4 address space in ram easily and this obviously can't be done for IPV6.

Might make some design compromises slower and more complex, such as storing routes in Bloom filters rather than simple cache.

bogomipz · on June 26, 2017

No? Doesn't compress nicely with the empty octets and colon substitution?

slackingoff2017 · on June 27, 2017

Routers don't store IP addresses as strings... They use 32 bit and 2X64 bit unsigned ints usually.

And you can't compress something when measuring cardinality. Each route takes up an entry and the minimum space you can use with a non probabilistic structure is 1 bit per entry in a bit map.

There's compressed bitmaps but the random nature of routes will not compress well.

comice · on June 26, 2017

The removal of the route cache caused a notable change in behaviour for equal cost routes.

When there was a cache, you got a kind of session stickiness for free. Which meant you could do some fairly good load balancing at the network layer by advertising multiple routes to the same IP locally. The router would choose one of the routes when it saw the first packet, cache the decision and send the rest of the packets to it (the cache key was configurable and could include src address, amongst other things).

Now you just get round robin between all the advertised paths - no good for tcp load balancing.

vbernat · on June 26, 2017

This behaviour has been fixed in 4.4. It has been mostly unnoticed until then! See https://www.reddit.com/r/networking/comments/4q3wmq/ipv4_flo...

comice · on June 26, 2017

oh super! thanks!

xiconfjs · on June 26, 2017

How does this 50ns compare to commercial hardware routers like Brocade, which have dedicated memory (TCAM) for their routing table, which should be extra fast memory - as e.x. Brocade claims?!

vbernat · on June 26, 2017

Hardware usually scales at line rate without any issue, something Linux cannot.

From a pure latency point of view, a medium-range router (using the latest Broadcom Tomahawk chipset) takes around 500ns to route a packet (note that such a chipset cannot handle a full view). It doesn't seem unlikely that Linux can do the same job in certain configuration in about 100ns (notably, no netfilter rules). However, this takes a whole core to do that. If you have 24 of them, you get 12Gbps of transfer rate. With an hardware platform, packets are pipelined, so with some relatively small buffers, you can still achieve several hundreds of Gbps of traffic (per chip), something Linux cannot do in software.

I can't say about TCAM performance alone. Are there super fast or are there able to perform parallel lookup?

yusyusyus · on June 26, 2017

> I can't say about TCAM performance alone. Are there super fast or are there able to perform parallel lookup?

I think this doc[0] helps to understand the parallelism/performance from a TCAM perspective.

[0] https://www.renesas.com/pt-br/doc/products/memory/r10an0013e...

ladzoppelin · on June 26, 2017

Why is this image https://d1g3mdmxf8zbo9.cloudfront.net/images/linux/lpc-trie-... jumbled in Firefox beta?

vbernat · on June 26, 2017

Author here. I am also using Firefox (on Linux) and the image is displayed fine. I don't know exactly what you mean by jumbled, but I don't hard-code fonts in the SVG: it uses the default sans-serif font of your system. May this explain what you get? I am usually careful with the anchor points and leave some additional space to ensure it is displayed correctly with whatever font is used.

A solution would be to convert the text to paths, but I didn't find any tool that would do that efficiently. With Inkscape, there is a serious size bump when doing that.

dom0 · on June 26, 2017

> With Inkscape, there is a serious size bump when doing that.

Yes, you can't really avoid that when converting text to path objects, there is no way to do something similar to PDF's subset embedding AFAIK.

I've tried to get SVG diagrams with text to work, but they just always fall apart somewhere. High-resolution PNG or inflated SVG seem the only viable methods. svgz (svg+gz) helps with the size inflation, but is not supported by Firefox when accessing local files.

It's kinda a shame that web browsers can't display single-page PDF documents in-line, as images.

http://i.imgur.com/x0TuCMC.png

eriknstr · on June 26, 2017

> Yes, you can't really avoid that when converting text to path objects, there is no way to do something similar to PDF's subset embedding AFAIK.

If it's not done already one could probably save a little bit of space by extending Inkscape to define an SVG symbol for each letter used.

https://developer.mozilla.org/en-US/docs/Web/SVG/Element/sym...

vbernat · on June 26, 2017

SVG can be served gzipped transparently. Ideally, when converting text to path, each letter (or group of letters) should be put in a group which can be referred with <use>.

Thanks for the example. Is the text in other figures OK?

dom0 · on June 26, 2017

No, they're all affected like that which is also my general observation. I'm not sure what causes it (possibly differing default font sizes, zooms or font selection). Chrome seems to generally use less defaults from the platform than Firefox, as is the case here as well — Chrome renders the figures fine. It usually does, though it can't help with missing fonts.

> Ideally, when converting text to path, each letter (or group of letters) should be put in a group which can be referred with <use>.

Oho, that's a very interesting idea! Might be possible as a post-processing step, depending on how exactly Inkscape converts runs of glyphs (I never looked inside one of those 1.5 MB SVGs...).

> SVG can be served gzipped transparently.

Yes; in my case the figures are mostly in docs, so checked-in size matters; colleagues have to pull changes and pushing xx MB for figures (and frequently updating them as well — something that git does not handle well) is not nice. For open source, it ends up in the source distribution. I think in one of my projects it was inflated by xxx % due to that :D

Some hard numbers: A not-that-complex figure showing a workflow. The .vsd is 145 kB, the PNG is—after optipng—about 1 MB and the SVG was iirc around 2.5 MB (Visio->SVG->Inkscape on the same machine otherwise results are bad->Convert To Paths->SVG), not gzipped since that kinda breaks local editing (with Firefox). While the high-resolution PNG looks good in print, scaling it to displays ain't that great and text doesn't look sharp. SVG needs a separate conversion for print (SVG[->PS]->PDF).

vbernat · on June 26, 2017

> Oho, that's a very interesting idea! Might be possible as a post-processing step, depending on how exactly Inkscape converts runs of glyphs (I never looked inside one of those 1.5 MB SVGs...).

I would hope that something like svgo could do that. BTW, due to an error on my side, the SVG images didn't get processed by SVGO. I just fixed that, maybe this would help.

dom0 · on June 26, 2017

> I just fixed that, maybe this would help.

No luck

> SVGO

Interesting tool, didn't know about it. It certainly works and reduces the size of a SVG with path-based text by ~50 %. Further, gzip'ability is increased as well. I tested this with a 1.4 MB SVG, SVGO shrunk it to 765 kB, and gzip compresses it further to 100 kB. The original SVG is only compressed to about 260 kB, thus gzip compression is enhanced by ~20 % by SVGO.

On the other hand, neither Gwenview nor Karbon display the SVGO'd figure as it was. Firefox and Chrome seem to render it fine. rsvg-convert results in a large and slow PDF.

At this point the "technically best" option to me seems to be

(1) create an SVG, convert text to paths on the same machine, use SVGO and gzip (and just use python -m http.server or sphinx-livehtml or similar for local editing) for HTML and

(2) create a PDF completely separate from the SVG processing chain for print, since no SVG to PDF conversion is satisfactory (file size, preservation of contents, rendering / printing time—stuff that is slowly rendered is slow to print as well!).

... and then tell the rest of the content pipeline that figures are now in two formats. Ugh.

Anyway, my illustration troubles are a bit off-topic to your great article :)

vbernat · on June 26, 2017

This problem was bothering me. After digging more, I have found this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=935056.

eriknstr · on June 26, 2017

Submit a bug report to Mozilla.

https://bugzilla.mozilla.org/enter_bug.cgi