I'm confused by the language here; it seems "model" means different things. To m...

thomasfedb · 2025-03-18T13:18:05 1742303885

I find the distinction you draw between weights and a program interesting - partially the idea that one is a “static file” and the other isn’t.

What makes a file non-static (dynamic?) other than +x?

Both are instructions about how to perform a computation. Both require other software/hardware/microcode to run. In general, the stack is tall!

Even so, I do agree that “a bunch of matrices” feels different to “a bunch of instructions” - although arguably the former may be closer in architecture to the greatest computing machine we know (the brain) than the latter.

</armchair>

wongarsu · 2025-03-18T13:50:43 1742305843

Arguably the distinction between a .guff file and a .guff file with a llama.cpp runner slapped in front of it is negligible. But it does raise an interesting point the article glosses over:

There is a lot happening between a model file sitting on a disk and serving it in an API with attached playground, billing, abuse handling, etc, handling the load of thousands or millions of users calling these incredibly demanding programs. A lot of clever software, good hardware, even down to acquiring buildings and dealing with the order backlog for backup diesel generators.

Improvements in that layer were a large part of what OpenAI to go from the relative obscurity of GPT3.5 to generating massive hype with a ChatGPT anyone could try at a whim. As a more recent example x.ai seems to be struggling with that layer a lot right now. Grok3 is pretty good, but has almost daily partial outages. The 1M context model is promised but never rolls out, instead on some days the served context size is even less than the usual 64k. And they haven't even started making it available on the API.

All of this will be easy when we reach the point where everyone can run powerful LLMs on their own device, but for now just having a 400B parameter model sitting on your hard drive doesn't get your business very far

bambax · 2025-03-18T13:24:13 1742304253

Yeah, "static" may not be the correct term, and sure, everything is a file. Yet +x makes a big difference. You can't chmod a list of weights and have it "do" anything.

Dorialexander · 2025-03-18T13:46:51 1742305611

So to clarify: the important product that people will ultimately want is the model. Obviously you need to design an infra/UI around it but that's not the core product.

The really important distinction is between workflow (what everyone use in applied LLM right now) and actual agents. LLM agents can take their own decision, browse online, use tools, etc. without direct supervision as they are directly trained for the task. They internalize all the features of LLM orchestration.

The expression ultimately comes from a slide from OpenAI from 2023 https://pbs.twimg.com/media/Gly1v0zXIAAGJFz?format=jpg&name=... — so in a way its a long held vision in big labs, just getting more accute now.

jcattle · 2025-03-18T13:32:03 1742304723

I wouldn't say it is correct. A model is not just a static file containing numbers. Those weights (numbers) you are talking about are absolutely meaningless without the architecture of the model.

The model is the inference engine, a model which can't do inference isn't a model.