1. layers are a thing (and while any given piece of hardware or software can be serving as an amalgam of any contiguous sequence of layers, you can still analyze the behavior of such a component as if it were N separate abstract components, one for each layer it embodies);
2. layering and layering violations are a thing, in the particular sense of code that intermingles and entangles the concerns of different network layers being automatically a design smell (e.g. OpenVPN smells because, rather than building a clean layer-1 circuit abstraction on top of a layer-4/5/7 stream, and then running a regular substrate-oblivious layer-2 on top, OpenVPN runs a "dirty" layer-2 implementation directly on top of a layer-7 protocol (HTTP), where the layer-2 implementation knows things about HTTP and uses HTTP features to signal layer-2 data, such that it can no longer freely interoperate with other layer-2 implementations);
3. but just going down the layer stack, repeating layers, is not a layering violation. You can build all the way up to a circuit-switching abstraction like TCP, and then put PPP on that to go down to layer 2, and come back up again, and that's not even bad engineering.
"1. layers are a thing (and while any given piece of hardware or software can be serving as an amalgam of any contiguous sequence of layers, you can still analyze the behavior of such a component as if it were N separate abstract components, one for each layer it embodies);"
* Path MTU discovery: For proper operation, TCP needs to know a link-layer property for each of the links between a source and destination.
This bypasses the IP layer, because IP fragmentation does not play well with TCP. On the other hand, TCP does not even see the concept of a "path" between the source and destination; IP may route each segment uniquely.
* TCP over wireless links: TCP makes the assumption that segment loss implies congestion; wireless links have the propensity to drop packets for a plethora of reasons that have nothing to do with congestion. Hey, it's a bad assumption, and there's work on congestion controls that don't make that assumption, but maybe we ought to ask Van Jacobson if life mightn't be easier if the link could tell the transport protocol, "My bad! That was me, I did that?"
* Path MTU discovery: that's part of the IP contract. IP provides an unreliable datagram service with an MTU that varies based on destination endpoint but will never be below 1280b (in IPv6 - IPv4 was 576b). IPv6 also wisely doesn't do fragmentation; sizing your packets correctly is the job of layer 4.
* TCP over wireless links: TCP's congestion control mechanism is a heuristic based on ever-evolving understanding of the characteristics of links in the wild. There are things that layer 3 can do that unambiguously get in layer 4's way (bufferbloat makes low-latency response unfeasible), but it's layer 4's job to deal with reliability and congestion control. (By the way - unlike LFNs, WiFi is actually not a pathological case for TCP congestion control and buffering. A good mental model for those periodic WiFi drops is of an Ethernet cable being disconnected and reconnected with a different one picked at random from a supply closet. In a lot of very common cases, when traffic gets passed again it will not be at the same throughput as before and so the endpoints need to rediscover the available throughput.)
To your more general suggestions about alternative designs: generally, schemes that have the link layer communicate with the endpoints using them scale BADLY to large internetworks, and the global internet is the largest.
1. layers are a thing (and while any given piece of hardware or software can be serving as an amalgam of any contiguous sequence of layers, you can still analyze the behavior of such a component as if it were N separate abstract components, one for each layer it embodies);
2. layering and layering violations are a thing, in the particular sense of code that intermingles and entangles the concerns of different network layers being automatically a design smell (e.g. OpenVPN smells because, rather than building a clean layer-1 circuit abstraction on top of a layer-4/5/7 stream, and then running a regular substrate-oblivious layer-2 on top, OpenVPN runs a "dirty" layer-2 implementation directly on top of a layer-7 protocol (HTTP), where the layer-2 implementation knows things about HTTP and uses HTTP features to signal layer-2 data, such that it can no longer freely interoperate with other layer-2 implementations);
3. but just going down the layer stack, repeating layers, is not a layering violation. You can build all the way up to a circuit-switching abstraction like TCP, and then put PPP on that to go down to layer 2, and come back up again, and that's not even bad engineering.