THAT. This is what I don't get. Instead of fixing a complex system let's build m...

ekianjo · 2025-07-05T23:42:57 1751758977

> If LLM is a black box by definition and there's no way to make it consistently work correctly, what is it good for?..

many things are unpredictable on the real world. Most of the machines we make are built upon layers of redundancies to make imperfect systems stable and predictable. this is no different.

habinero · 2025-07-06T02:26:46 1751768806

It is different. Most systems aren't designed to be a slot machine.

ekianjo · 2025-07-06T03:33:01 1751772781

Yet RAG systems can perform quite well, so it's a definite proof that you can build something reliable most of the time out of something not reliable in the first place.

vrighter · 2025-07-07T11:53:03 1751889183

only if you lower your standards as to what "quite well". The biggest con by the AI industry so far is convincing people that 90% is somehow "quite well".

90% is only enough for uninformed people to buy into it, and fuel the hype train. 90% is low enough to be pretty much unusable in most production environments.

This is like coding FizzBuzz but only emitting Fizz, Buzz or a number, and skiping FizzBuzz, then claiming that your system is 93.3% accurate because only every 15th output is wrong. 93.3% is utter crap.

habinero · 2025-07-06T02:21:22 1751768482

Honestly? Spam and upselling executives on features that don't work. It's a pretty good autocomplete, too.

dummydummy1234 · 2025-07-06T18:27:03 1751826423

Ok, so take the example of RF communications:

You need to send data, but only 50% at random comes through. You could move things closer together, send at higher power, improve the signal modulation to give you higher snr.

All of these things are done, this is improving the base system. But at the end of the day rf is in the real world, and the real world sucks. Random shit happens to make your signal not come through.

So what do you do? You design fault tolerant systems.

You add error correction to detect and automatically fix errors on receive.

You add tcp, to automatically retransmit missed packets.

You add validation to make sure that the received data is sane.

You use ecc ram to validate that an ion from the sun has not bit flipped your data.

These are extremely complex hierarchies of systems, and fundamentally they are unreliable. So you design the whole thing with error margins and fall back handling.

Yes a better understanding of the underlying system in question does allow you to make more efficient error correction/validation, but it does not change the fact that you need error correction.

And in the case of RF signals, for example, the most optimal design is not zero errors. In fact they way you design it is so that for a given snr, you expect a given probability of error. To get the maximum throughput, you design your error correction coding to handle that. And because even with that, there can be edge cases, you design higher level Mac layer logic to resend.

Yes, a better understanding of the error cases will make the error correction (agent loops) in llms better over time, but it will not remove the error correction.

there may be situations similar to RF, where a less accurate, but faster model, with more validation is preferred for a variety of engineering reasons, either throughput/cost/creativity/etc.

You need to look at llms as black boxes of the physical world, that we question, and get a answer. At the level of complexity it is more akin to physics, than software, there is no reliability inherent to them, only what we design around.

And there are tons of people doing the fundamental physical research into how they work, and how to make them work better.

But that is a completely different avenue of research from how to make useful systems out of unreliable components.

You do not need to fully model the path propagation of em waves in order to make a reliable communication system, it might help, but it will be fragile if you solely rely on it.

And if you engineer a reliable system architecture, it does not get fully invalidated when the scientists build better models of the underlying systems. You may modify the error correction codes but they do not go away.

For example the original Morse code, had hand handmade validation checks, handshakes etc. the architecture did not get wholly replaced with the advent of Shannon's information theory, even if some of the specific methods did.

And WW1/WW2 telegraphs, and communications were still useful despite not understanding the underlying information theory and being unreliable in many situations.

vrighter · 2025-07-07T12:04:26 1751889866

Except that a corrupt packet can easily be detected when compared to a valid packet (is the checksum valid?). There is an algorithm to execute that can tell you, with high confidence, whether a given packet is corrupt or not.

In an LLM a token is a token. There are no semantics to anything in there. In order to answer the question "is this a good answer or not?" you would need a model that somehow doesn't hallucinate, because the tokens themselves don't have any mathematical properties such that they can be manipulated. A "hallucinated" token cannot, in any mathematical way, be distinguished from one that "wasn't hallucinated". That's a big difference.

All of the above stuff you mentioned is mathematically proven to improve the desired performance target in a controlled, well understood way. We know their limitations, we know their strengths. They are backed by solid foundations, and can be relied upon.

This is not comparable to an LLM where the best you can do is just "pull more heuristics out of someone's ass and hope for the best"

dummydummy1234 · 2025-07-08T03:48:06 1751946486

But, my point is that, in the 1800s it was not understood, yet we still had telegraphs. They came up with inefficient hodge-podge engineered solutions that mostly worked, and sometimes did not.

We are in the equivalent time period for llms and ML in general, we can hack things together that kinda work.

We only understand a sliver of fundamentals of how these things behave as complex systems. But that does not mean we should or need to wait 20 years for the research and science to keep up.

These are useful today, hallucinations and all, and building things that get the right hallucinations for your use case, even if not 100% reliable is possible.