It's easy to distribute across many computers which communicate with high latenc...

alexvoda · on April 7, 2023

LLMs are already running distributed on swarms of computers. A swarm of swarms is just a bigger swarm.

So again, what is the actual difference you are imagining?

Or is it just that distributed X is fashionable?

pixl97 · on April 7, 2023

Rather large parts of your brain are more generalized, but in particular places we have more specialized areas. Now, you looking at it would consider it all the same brain most likely, but if you're looking at it in systems thinking view, it's a small separate brain with a slightly different task than the rest of the brain.

If 80% of the processors in a cluster are running 'general LLM' and 20% are running 'math LLM' are they the same cluster? Could you host the cluster in a different data center? What if you want to test different math LLM modules out with the general intelligence?

alexvoda · on April 7, 2023

I think I would consider them split when the different modules are interchangeable so there is de facto an interface.

In the case of the brain, while certain functional regions are highly specialized I would not consider them "a small separate brain". Functional regions are not sub-organs.

staunton · on April 7, 2023

Significantly higher latency than you have within a single datacenter. Think "my GPU working with your GPU".

alexvoda · on April 7, 2023

There are already LLMs hosted across the internet (Folding@Home style) instead of in a single data center.

Just because the swarm infrastructure hosting an LLM has higher latency across certain paths does not make it a swarm of LLMs.

staunton · on April 7, 2023

> There are already LLMs hosted across the internet (Folding@Home style)

Interesting, I haven't heard of that. Can you name examples?

alexvoda · on April 7, 2023

I read about Petals (1) some time ago here on HN. There are surely others too, but I don't remember the names.

1. https://github.com/bigscience-workshop/petals