How is "LLM swarm computation" different that single bigger LLM?

SanderNL · on April 7, 2023

The same reason why you don't let Mr Musk do all the work. He can't.

One LLM is limited, one obvious limitation is its context window. Using a swarm of LLMs that each do a little task can alleviate that.

We do it too and it's called delegation.

Edit: BTW, "swarm" is meaningless with LLMs. It can be the same instance, but prompted differently each time.

ParetoOptimal · on April 7, 2023

> The same reason why you don't let Mr Musk do all the work. He can't.

Better to limit his incompetence to one position.

akiselev · on April 7, 2023

I beg to differ. Imagine him taking down Twitter, Facebook, Instagram, and all the others in one fell swoop!

int_19h · on April 7, 2023

Context window is a limitation, but have we actually hit the ceiling wrt scaling that? For GPT, you need O(N^2) VRAM to handle larger context sizes, but that is a "I need more hardware" problem ultimately; as I understand, the reason why they don't go higher is because of economic viability of it, not because it couldn't be done in principle. And there are many interesting hardware developments in the pipeline now that the engineers know exactly what kind of compute they can narrowly optimize for.

So, perhaps, there aren't swarms yet just because there are easier ways to scale for now?

SanderNL · on April 8, 2023

I am sure the context window can go up, maybe into the MB range. But I still see delegation as a necessary part of the solution.

For the same reason one genius human does not suddenly need less support staff, they actually need more.

Edit: and why it isn’t here yet is because it’s new and hard.

staunton · on April 7, 2023

It's easy to distribute across many computers which communicate with high latency

alexvoda · on April 7, 2023

LLMs are already running distributed on swarms of computers. A swarm of swarms is just a bigger swarm.

So again, what is the actual difference you are imagining?

Or is it just that distributed X is fashionable?

pixl97 · on April 7, 2023

Rather large parts of your brain are more generalized, but in particular places we have more specialized areas. Now, you looking at it would consider it all the same brain most likely, but if you're looking at it in systems thinking view, it's a small separate brain with a slightly different task than the rest of the brain.

If 80% of the processors in a cluster are running 'general LLM' and 20% are running 'math LLM' are they the same cluster? Could you host the cluster in a different data center? What if you want to test different math LLM modules out with the general intelligence?

alexvoda · on April 7, 2023

I think I would consider them split when the different modules are interchangeable so there is de facto an interface.

In the case of the brain, while certain functional regions are highly specialized I would not consider them "a small separate brain". Functional regions are not sub-organs.

staunton · on April 7, 2023

Significantly higher latency than you have within a single datacenter. Think "my GPU working with your GPU".

alexvoda · on April 7, 2023

There are already LLMs hosted across the internet (Folding@Home style) instead of in a single data center.

Just because the swarm infrastructure hosting an LLM has higher latency across certain paths does not make it a swarm of LLMs.

staunton · on April 7, 2023

> There are already LLMs hosted across the internet (Folding@Home style)

Interesting, I haven't heard of that. Can you name examples?

alexvoda · on April 7, 2023

I read about Petals (1) some time ago here on HN. There are surely others too, but I don't remember the names.

1. https://github.com/bigscience-workshop/petals