What a great post, about something I've been working on almost as long as Avery....

ChuckMcM · on June 12, 2019

I was going to say the same thing, so I won't :-). That said, I was in the Sun Systems group when Bob Hinden ("Boss Bob" (there were three Bobs in the group) of the network group was proposing SIPP as the "next generation IP." It has been illustrative (but I don't think educational alas) to see how much more easily this protocol would have managed to be implemented and deployed.

That said, as Thomas points out (indirectly) in the parent to this comment, the Internet was deployed across a pre-existing network (the telephone switching network) without any co-operation from the people who defined or wrote or deployed the protocols the implement telephone switching. As long as the connection from point A to point B worked, the packets could figure out how to get from A to B. There is absolutely nothing preventing a suitably motivated group from creating their own elegant "network" that they layer on top of the existing broadband networks of today, without having to either consult, or get permission from, any standards organization.

cure · on June 13, 2019

> There is absolutely nothing preventing a suitably motivated group from creating their own elegant "network" that they layer on top of the existing broadband networks of today, without having to either consult, or get permission from, any standards organization.

And there are numerous groups doing that, e.g. https://yggdrasil-network.github.io/ and https://github.com/cjdelisle/cjdns.

mrkstu · on June 13, 2019

That is essentially what most SD-WAN devices do- treat the Internet as an 'underlay' network- most of them are using proprietary code to create their own network infrastructure that isn't standards based.

jiveturkey · on June 13, 2019

It generally is standards based. Their customers demand it to be so. IPSec tunnel overlays, usually if not always full mesh. The non-standard part is tiny insignificant tweaks to IPSec that render it unacceptable to standards speaking endpoints, thus you can't coordinate with your open source IPSec device. Stupid myopia, because these systems depend on proprietary orchestration anyway.

basch · on June 13, 2019

+1 for velocloud. SDWAN mesh between all your devices, and they provide a cloud gateway that allows you to connect to any compatible ipsec device, without having to backhaul all the data to one specific endpoint.

otterley · on June 14, 2019

Here's the SIPP paper, in case anyone is interested: https://datatracker.ietf.org/doc/rfc1710/?include_text=1

azernik · on June 12, 2019

ARP is also nice and abstract and well-defined; it can bridge from any multi-endpoint subnet's layer-2 address to an IP address. Not sure if anyone actually uses it for non-802, but the generality has forced a clean design.

To add to your praises of DHCP - it can also configure routers, and is in fact the standard solution for that in IPv6. Instead of giving you one or several addresses for NAT through DHCP, it gives the router an address for itself, and also a prefix to assign to clients on its internal network. Super neat stuff, and a boon to administrators.

Also to nitpick your summary, because nitpicking is what I do - layering violations are a thing, but only in the same way that violating software abstraction barriers are a thing. Not a hard-and-fast rule, and sometimes if you're doing weird enough stuff you just have to do it.

mcguire · on June 12, 2019

To go further off the rails, layering violations are not a thing, because "layers" are a remarkably poor abstraction for a network protocol "stack".

azernik · on June 12, 2019

Got to disagree. The level of abstraction is very useful as a means of swapping out one layer without changing the other technologies - e.g. running IP over point-to-point fiber, or AlohaNet, or 802, or carrier pigeon. Or running running ethernet over a phy with whatever ridiculous number of Mbps is the latest thing. (802.11, of course, has effectively zero phy/link distinction, but anything that has to deal with such high packet drop rates and negotiation of physical layer between endpoints is going to be a mess.)

There's an issue with the specific OSI layering, but that's higher in the stack: it has waaay too many layers at the top. Everything up to maybe the transport layer (TCP/UDP/SCTP) is very well delinked in most implementations, but the session/presentation/application layer distinctions are total BS.

tptacek · on June 12, 2019

They're a useful tool for understanding the mindset of the original developers, but as you go "up" in the layers, the division of responsibilities becomes more and more arbitrary, with a very sharp uptick after "layer 3".

But more importantly, the notion that routing and forwarding "belongs" in IP, because that's the layer 3 protocol --- that's just false. There's no validity to it, and lots of systems have built overlays with layer 3 function on top of UDP (which in the "layering" model is a "layer 4" protocol, but is really best thought of as an escape hatch with which to build any new system you want on top of IP).

derefr · on June 13, 2019

How about:

1. layers are a thing (and while any given piece of hardware or software can be serving as an amalgam of any contiguous sequence of layers, you can still analyze the behavior of such a component as if it were N separate abstract components, one for each layer it embodies);

2. layering and layering violations are a thing, in the particular sense of code that intermingles and entangles the concerns of different network layers being automatically a design smell (e.g. OpenVPN smells because, rather than building a clean layer-1 circuit abstraction on top of a layer-4/5/7 stream, and then running a regular substrate-oblivious layer-2 on top, OpenVPN runs a "dirty" layer-2 implementation directly on top of a layer-7 protocol (HTTP), where the layer-2 implementation knows things about HTTP and uses HTTP features to signal layer-2 data, such that it can no longer freely interoperate with other layer-2 implementations);

3. but just going down the layer stack, repeating layers, is not a layering violation. You can build all the way up to a circuit-switching abstraction like TCP, and then put PPP on that to go down to layer 2, and come back up again, and that's not even bad engineering.

mcguire · on June 13, 2019

"1. layers are a thing (and while any given piece of hardware or software can be serving as an amalgam of any contiguous sequence of layers, you can still analyze the behavior of such a component as if it were N separate abstract components, one for each layer it embodies);"

* Path MTU discovery: For proper operation, TCP needs to know a link-layer property for each of the links between a source and destination.

This bypasses the IP layer, because IP fragmentation does not play well with TCP. On the other hand, TCP does not even see the concept of a "path" between the source and destination; IP may route each segment uniquely.

* TCP over wireless links: TCP makes the assumption that segment loss implies congestion; wireless links have the propensity to drop packets for a plethora of reasons that have nothing to do with congestion. Hey, it's a bad assumption, and there's work on congestion controls that don't make that assumption, but maybe we ought to ask Van Jacobson if life mightn't be easier if the link could tell the transport protocol, "My bad! That was me, I did that?"

azernik · on June 14, 2019

* Path MTU discovery: that's part of the IP contract. IP provides an unreliable datagram service with an MTU that varies based on destination endpoint but will never be below 1280b (in IPv6 - IPv4 was 576b). IPv6 also wisely doesn't do fragmentation; sizing your packets correctly is the job of layer 4.

* TCP over wireless links: TCP's congestion control mechanism is a heuristic based on ever-evolving understanding of the characteristics of links in the wild. There are things that layer 3 can do that unambiguously get in layer 4's way (bufferbloat makes low-latency response unfeasible), but it's layer 4's job to deal with reliability and congestion control. (By the way - unlike LFNs, WiFi is actually not a pathological case for TCP congestion control and buffering. A good mental model for those periodic WiFi drops is of an Ethernet cable being disconnected and reconnected with a different one picked at random from a supply closet. In a lot of very common cases, when traffic gets passed again it will not be at the same throughput as before and so the endpoints need to rediscover the available throughput.)

To your more general suggestions about alternative designs: generally, schemes that have the link layer communicate with the endpoints using them scale BADLY to large internetworks, and the global internet is the largest.

azernik · on June 12, 2019

Who makes systems that do routing on UDP?

tptacek · on June 12, 2019

What does "on UDP" mean? UDP is just a means of running an arbitrary datagram protocol that rides on top of IP; it's how you'd build a system that treats IP the way IP treats Ethernet.

azernik · on June 13, 2019

Sure, but you mentioned protocols that have "built overlays with layer 3 function on top of UDP". What are the examples you're referring to?

EDIT: My comment in reply to the sibling comment, which mentioned vxlan:

That's more of a recursive version of the lower layers; using layers 1-4 of one instance of the OSI model as layer 2 of another instance. If anything, this demonstrates just how useful the clear abstraction barrier between layer 2 and layer 3 is; you can have a very complicated software package (like a VPN) as a layer 2 instead of a physical network and all the code from layer 3 up doesn't even need to know.

dnautics · on June 12, 2019

Vxlan, for starters.

azernik · on June 13, 2019

That's more of a recursive version of the lower layers; using layers 1-4 of one instance of the OSI model as layer 2 of another instance.

If anything, this demonstrates just how useful the clear abstraction barrier between layer 2 and layer 3 is; you can have a very complicated software package (like a VPN) as a layer 2 instead of a physical network and all the code from layer 3 up doesn't even need to know.

mcguire · on June 13, 2019

There are other models of modularity that make it easy to separate transport, routing, link, and physical protocols without starting from the assumption that "layer X can only interact with the minimum common denominator interface for layers X-1 and X+1". That assumption leads to everything from the PMTU discovery silliness to the pain of getting TCP to work correctly over links like wireless where packet loss does not imply congestion.

bdamm · on June 14, 2019

I've heard some folks talk about TLS as a "session" layer, and it is fortunate that we no longer have to translate between ASCII and EBCDIC underneath the application, so the "presentation" layer now seems like it is mis-named. Ah how times change.

tssva · on June 13, 2019

In the early to mid 80s "layer 3 switching" was becoming a thing and each switch vendor had their own method for implementation. Cabletron was a large switch vendor then and their method of layer 3 switching depended upon ARP. Each host would be assigned a /32 ip address and their default gateway would be their own ip address. There was a registry setting available on Windows NT server that would cause the DHCP server to provide hosts with DHCP address and router assignments that met these requirements.

Ports that had routers connected to them were designated as router ports and needed to have proxy arp enabled.

Whenever a host wanted to talk to any IP address which was not already in it's arp cache it would send an arp request. The management system of the switch, which in this case was software running on a server outside the switch, would look up in it's tables if it knew the IP address from another switch port. If so and all policies allowed the host sending the request to speak to the port the destination was associated with the manager would respond to the arp request with the mac of the destination. If the requested IP address didn't exist in it's tables the request would be flooded out all router ports.

sandos · on June 13, 2019

"Windows NT is a family of operating systems produced by Microsoft, the first version of which was released on July 27, 1993"

NT did not exist in the early 80s, maybe just a typo?

tssva · on June 13, 2019

Yes, a typo. I meant 90s.

runjake · on June 12, 2019

Good points.

One issue, though "nobody should care about "IGMP-snooping bridges". I so wish this were true, but (first-hand knowledge) tons of infrastructure these days utilizes IP multicast, including building lighting, HVAC, intercom, VoIP, etc.

socraticmethod · on June 12, 2019

| IP multicast is and always was hopeless.

Out of curiosity what do you mean by this? Are you referring to all Multicast solutions? Can I just be specific -- what do you think of Dante, Audio/Video-over-IP or other time sensitive an synced services that use Multicast?

throwaway2048 · on June 12, 2019

He means multicast over the internet at large, not tightly controlled networks.

It is basically a completely unstoppable DDOS and abuse tool.

jandrese · on June 12, 2019

Isn't plain old UDP already an unstoppable DDOS tool? Multicast doesn't make it that much harder to stop. In fact using it as a DDOS tool seems a bit problematic since the victim would need to join the groups to receive the traffic. Yes a piece of malware on the victim's computer could go and attempt to join every single multicast source on the internet, but it's a self correcting problem since they wouldn't be able to maintain their subscriptions with their link totally saturated. Much easier to stop than normal DDOS attacks.

The problem is that we have never figured out a multicast routing solution that would work at Internet scale. Especially one that can be implemented in hardware on routers.

pdkl95 · on June 13, 2019

> we have never figured out a multicast routing solution that would work at Internet scale

Sure we did, it's called bittorrent. Ok, it isn't really multicast and you probably have to sacrifice ordered delivery, but for many of the use-cases where multiple-delivery would have been a good idea, bittorrent has proven to be a very successful "minimum viable multicast".

Bittorrent succeeded while decades of "multicast" research/experiments failed because bittorrent realized the multi-delivery problem was really about managing peers, which isn't solvable at layer-3.

edit: by which I mean: previous attempts at multicasting assumed it was a packet routing problem, when peer management is actually a question for the application layer.

jandrese · on June 13, 2019

Bittorrent is the opposite of multicast. Instead of aggregating the data into a single channel to save bandwidth, we instead split it up across every single recipient in a huge NxN graph.

This also illustrates the other problem with multicast on the Internet: It's mostly saving bandwidth on the backbone and at the server. The backbone has plenty of bandwidth to spare, and servers are often in data centers these days where bandwidth is not a huge concern.

The use case where someone does video production in their basement and broadcasts it out to millions of people across the internet over their home cable modem connection is just not compelling enough for ISPs and the backbone providers to make Multicast happen. Just put it on Youtube and let Google sort it out.

floatboth · on June 13, 2019

hmm. Multicast is often used for, like, IPTV. That's a very different task from BitTorrent. Torrents are indeed about managing peers. IPTV is centralized, not p2p, the benefit of multicast for IPTV is that the routers in between the source (ISP) and your client only carry one copy of the stream instead of one stream per client.

At internet scale.. well, it would be nice to have this efficiency for Twitch and YouTube Live. Which are also pretty centralized (CDN) so I don't see how this is about managing peers.

opencl · on June 13, 2019

Bittorrent has a P2P streaming protocol called Bittorrent Live which was used to operate a TV service for several years but I have no idea how efficient it is compared to IPTV multicasting or central servers+CDN.

throwaway2048 · on June 13, 2019

Multicast has the potential to almost arbitrarily amplify DDOS with IP spoofing (which, yes, still exists).

yusyusyus · on June 13, 2019

How exactly? Sources have to pass RPF check following ucast path and receivers have to follow the path either to RP or source, or the packets don't get there.

tptacek · on June 12, 2019

It's also, effectively, a promise to maintain Internet-wide routing table entries for every page on the web rather than every host (which is something we also can't really do today).

Dylan16807 · on June 13, 2019

Multicast for everything is difficult. But would it be all that difficult to have 100k or 1M entries?

Something that would definitely be doable today is an IP header that stores 25 or 50 extra destination addresses. But it seems like nobody really cares. Just make streaming services send out a thousand packets with identical data.

pas · on June 13, 2019

Well, it could be done based on microtransactions. To set up your mcast tree you need to pay. The slots are auctioned off every X minutes on a DAG-chain-block-thing.

Dylan16807 · on June 13, 2019

No need to over-complicate things. You can sell them on a monthly/yearly basis just like phone numbers. That's not the hard part.

pas · on June 13, 2019

Sure, but that means if you want to do a live broadcast something right now, you can't just allocate a slot for the next hours.