Really? Seems like scaling is pretty tolerant of latency, but very bandwidth intensive. Thus the move from IB to various flavors of ethernet (for AMD's GPUs, Tenstorrent, various others). Not to mention broadcom pushing various 50 and 100tbit ethernet switching chips for AI.
Even 25gbit these days is pretty affordable for home, if it scaled 5x better than 5gbit that might be enough to make larger models MUCH more practical.
It heavily depends on the workload, if one node needs to interact commonly with the memory on another node, like calculating the output of the weights stored on node the other node for the LLM, it's going to be dog slow because it has to wait 100x as long as it does for local. If you can batch the work into chunks that mostly get processed on one node then get passed to another then it can be parallelized easily.
eg if the individual layers of your model can fit on one node and the output can be pipelined so work can continue cascading through the various nodes it'd do well. But because the current word changes the next word a lot on LLMs you can't pipeline it. But you can see it in this [0] image from the attached blog post when he was testing llama.cpp, each node processes a batch of work and passes it off to the next node then goes idle.
Even 25gbit these days is pretty affordable for home, if it scaled 5x better than 5gbit that might be enough to make larger models MUCH more practical.