I find the distinction you draw between weights and a program interesting - partially the idea that one is a “static file” and the other isn’t.
What makes a file non-static (dynamic?) other than +x?
Both are instructions about how to perform a computation. Both require other software/hardware/microcode to run. In general, the stack is tall!
Even so, I do agree that “a bunch of matrices” feels different to “a bunch of instructions” - although arguably the former may be closer in architecture to the greatest computing machine we know (the brain) than the latter.
Arguably the distinction between a .guff file and a .guff file with a llama.cpp runner slapped in front of it is negligible. But it does raise an interesting point the article glosses over:
There is a lot happening between a model file sitting on a disk and serving it in an API with attached playground, billing, abuse handling, etc, handling the load of thousands or millions of users calling these incredibly demanding programs. A lot of clever software, good hardware, even down to acquiring buildings and dealing with the order backlog for backup diesel generators.
Improvements in that layer were a large part of what OpenAI to go from the relative obscurity of GPT3.5 to generating massive hype with a ChatGPT anyone could try at a whim. As a more recent example x.ai seems to be struggling with that layer a lot right now. Grok3 is pretty good, but has almost daily partial outages. The 1M context model is promised but never rolls out, instead on some days the served context size is even less than the usual 64k. And they haven't even started making it available on the API.
All of this will be easy when we reach the point where everyone can run powerful LLMs on their own device, but for now just having a 400B parameter model sitting on your hard drive doesn't get your business very far
Yeah, "static" may not be the correct term, and sure, everything is a file. Yet +x makes a big difference. You can't chmod a list of weights and have it "do" anything.
What makes a file non-static (dynamic?) other than +x?
Both are instructions about how to perform a computation. Both require other software/hardware/microcode to run. In general, the stack is tall!
Even so, I do agree that “a bunch of matrices” feels different to “a bunch of instructions” - although arguably the former may be closer in architecture to the greatest computing machine we know (the brain) than the latter.
</armchair>