> Like there must be a size too small to hold "all the information" in. We're al...

hnuser123456 · 2025-03-21T00:46:51 1742518011

I suppose we could eventually get to a super-MoE architecture. Models are limited to 4-16GB in size, but you could have hundreds of models on various topics. Load from storage to RAM and unload as needed. Should be able to load up any 4-16GB model in a few seconds. Maybe as well as a 4GB "Resident LLM" that is always ready to figure out which expert to load.

timmg · 2025-03-21T00:18:24 1742516304

> We're already there. If you running a Mistral-Large-2411 and Mistral-Small-2409 locally, you'll find the larger model is able to recall more specific details about works of fiction.

Oh, for sure. I guess what I'm wondering is if we know the Small model (in this case) is too small -- or if we just haven't figured out how to train well enough?

Like, have we hit the limit already -- or, in (say) a year, will the Small model be able to recall everything the Big model does (say, as of today)?