Non-deterministic doesn't mean random or unpredictable. That's like saying the weather forecast is useless because it's not deterministic or always 100% accurate.
Last time I used GPT-4.5 to analyze blood results it gave different output if I uploaded it as 2 instead of 3 separated CSV files. It was both amazing experience: clear and easy to understand statements, and list of most common causes. And terrifying: "What about X?", "You ar absolutely right, there where X results included, disregard everything I wrote above, here is the new analysis".
So for me non-deterministic means unpredictable. Yes, there was nothing random or non-deterministic in that case, I could repeat both scenarios multiple times and get same results again. But the result is affected by something I didn't expect to matter. That damages the trust in tool, no matter how we call it.
LLMs seem best at creative brainstorming, coming up with ideas to check that you hadn't thought of. Their weakness becomes a non-issue because ideas are just things for you to check the viability of, because they could be completely unworkable.
> Non-deterministic doesn't mean random or unpredictable. That's like saying the weather forecast is useless because it's not deterministic or always 100% accurate.
I don't know where you got 'useless' from. LLMs are great, sometimes. They're not, other times. Which remarkably, is just like weather forecasts. The weather forecast is sometimes completely accurate. The weather forecast is sometimes completely inaccurate.
LLMs, like weather forecasting, have gotten better as more time and money has been invested in them.
Neither are perfect. Both are sometimes very good. Both are sometimes not.
Non-deterministic means random - that's the definition of the word. The weather forecast is also random - in fact, weather forecast is (if you simplify it too much) an average of several predictive (generative) models.
> Non-deterministic means random - that's the definition of the word.
That's not really the defintion. Non-determinism just means the outcome is not a pure function of the inputs. A PRNG doesn't become truly random just because we don't know the state and seed when calling the function and the same holds for LLMs. The non-determinism in LLMs comes from accepted race conditions in the GPU floating point math and the PRNG in the sampler.
That's besides the point, but we could have perfectly deterministic LLMs.
If you ask it what a star is, it’s never going to tell you it’s a giant piece of cheese floating in the the sky.
If you don’t believe me, try it, write a for loop which asks ChatGPT, what is a star (astronomy) exactly? Ask it 1000 times and then tell me how random it is versus how consistent it is.
The idea that non deterministic === random is totally deluded. It just means you cannot predict the exact tokens which will be produced but it doesn’t mean it’s random like a random number generator and it could be any thing.
If you ask what is Michael Jackson the entertainer famous for it’s going to tell you he’s famous for music and dancing. 1000/1000 times, is that random?
> If you ask it what a star is, it’s never going to tell you it’s a giant piece of cheese floating in the the sky.
Turn the Top-P and the temperature up. Turning up the Top-P will enable the LLM to actually produce such nonsense. Turning up the temperature will increase the chance that such nonsense is actually selected for the prediction (output).
I'm talking about the standard settings, and infact GPT-5 doesn't let you change the temperature anymore.
Also, that's not really the point. Humans can also produce nonsense if you torture them until they're talking nonsense, but that doesn't mean humans are "random."
LLMs are not random, they are non-deterministic, but the two words have different meanings.
Random means you cannot tell what is going to be produced at all, i.e. a random number generator.
But if you ask an LLM, is an Apple a fruit, answer yes or no only, the LLM is going to answer yes, 100% of the time. That isn't random.
Most things that are generally helpful and beneficial are not 100% helpful and beneficial 100% of the time.
I used GPT-4 as a second opinion on my medical tests and doctor's advice, and it suggested an alternate diagnosis and treatment plan that turned out to be correct. That was incredibly helpful and beneficial.
You're replying to a person who had a similar and even more helpful and beneficial experience because they're alive today.
Pedantically pointing out that a beneficial and helpful thing isn't 100% beneficial and helpful 100% of the time doesn't add anything useful to the conversation since everyone here already knows it's not 100%.
No, they can be. To state that they are, as an absolute, based on your sample size of one, especially with regard to other instances where ChatGPT has failed the user with serious physical results, is fallacious.
I am glad that you are OK, but as another user suggested, it's nowhere near as consistently accurate as it needs to be in order to be anywhere near an adequate substitute for a call to a GP or 911.