Hacker News new | past | comments | ask | show | jobs | submit login

I'm confused by the language here; it seems "model" means different things.

To me a "model" is a static file containing numbers. In front of that file is an inference engine that receives input from a user, runs it through the "model" and outputs the result. That inference engine is a program (not a static file) that can be generic (can run any number of models of the same format, like llama.cpp) or specific/proprietary. This program usually offers an API. "Wrappers" talk to those APIs and therefore, don't do much (they're neither an inference engine, nor a model) -- their specialty is UI.

But in this post it seems the term "model" covers a kind of full package that goes from LLM to UI, including a specific, dedicated inference engine?

If so, the point of the article would be that, because inference is in the process of being commoditized, the industry is moving to vertical integration so as to protect itself and create unique value propositions.

Is this interpretation correct?




I find the distinction you draw between weights and a program interesting - partially the idea that one is a “static file” and the other isn’t.

What makes a file non-static (dynamic?) other than +x?

Both are instructions about how to perform a computation. Both require other software/hardware/microcode to run. In general, the stack is tall!

Even so, I do agree that “a bunch of matrices” feels different to “a bunch of instructions” - although arguably the former may be closer in architecture to the greatest computing machine we know (the brain) than the latter.

</armchair>


Arguably the distinction between a .guff file and a .guff file with a llama.cpp runner slapped in front of it is negligible. But it does raise an interesting point the article glosses over:

There is a lot happening between a model file sitting on a disk and serving it in an API with attached playground, billing, abuse handling, etc, handling the load of thousands or millions of users calling these incredibly demanding programs. A lot of clever software, good hardware, even down to acquiring buildings and dealing with the order backlog for backup diesel generators.

Improvements in that layer were a large part of what OpenAI to go from the relative obscurity of GPT3.5 to generating massive hype with a ChatGPT anyone could try at a whim. As a more recent example x.ai seems to be struggling with that layer a lot right now. Grok3 is pretty good, but has almost daily partial outages. The 1M context model is promised but never rolls out, instead on some days the served context size is even less than the usual 64k. And they haven't even started making it available on the API.

All of this will be easy when we reach the point where everyone can run powerful LLMs on their own device, but for now just having a 400B parameter model sitting on your hard drive doesn't get your business very far


Yeah, "static" may not be the correct term, and sure, everything is a file. Yet +x makes a big difference. You can't chmod a list of weights and have it "do" anything.


So to clarify: the important product that people will ultimately want is the model. Obviously you need to design an infra/UI around it but that's not the core product.

The really important distinction is between workflow (what everyone use in applied LLM right now) and actual agents. LLM agents can take their own decision, browse online, use tools, etc. without direct supervision as they are directly trained for the task. They internalize all the features of LLM orchestration.

The expression ultimately comes from a slide from OpenAI from 2023 https://pbs.twimg.com/media/Gly1v0zXIAAGJFz?format=jpg&name=... — so in a way its a long held vision in big labs, just getting more accute now.


I wouldn't say it is correct. A model is not just a static file containing numbers. Those weights (numbers) you are talking about are absolutely meaningless without the architecture of the model.

The model is the inference engine, a model which can't do inference isn't a model.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: