Language models are not designed to know things, they are designed to say things - that's why they are called language models and not knowledge models.
Given a bunch of words have already been generated, it always ads the next words based on how common the sequence is.
The reason you get different answers each time is the effect of the pseudo-random number generator on picking the next word. The model looks at the probability distribution of most likely next words, and when the configuration parameter called "temperature" is 0 (and it is actually not possible to set to 0 in the GUI), there is no random influence, and strictly the most likely next word (top-1 MLE) will always be chosen. This leads to output that we would classify as "very boring".
So the model knows nothing about IBM, PS/2, 80286 versus 80486, CPUs, 280 or any models per se. -- One of the answers seems to suggest that there is no model 280, I wonder whether that one was generated through another process (there is a way to incorporate user feedback via "reinforcement learning"), or whether that was a consequence of the same randomized next-word picking, just a more lucky attempt.
> This leads to output that we would classify as "very boring".
Not really. I set temperature to 0 for my local models, it works fine.
The reason why the cloud UIs don't allow a temperature of 0 is because then models sometimes start to do infinite loops of tokens, and that would break the suspension of disbelief if the public saw it.
You must be using recent (or just different) models than those I tried. Mine returned garbage easily at temperature 0. (But unfortunately, I cannot try and report from there.)
This (LLM behaviour and benchmarking at low or 0 temperature value) should be a topic to investigate.
> Language models are not designed to know things, they are designed to say things - that's why they are called language models and not knowledge models.
This is true. But you go to Google not to 'have a chat' but ostensibly to learn something based in knowledge.
Google seem to be making an error in swapping the provision of 'knowledge' for 'words' you'd think, but then again perhaps it makes no difference when it comes to advertising dollars which is their actual business.
Given a bunch of words have already been generated, it always ads the next words based on how common the sequence is.
The reason you get different answers each time is the effect of the pseudo-random number generator on picking the next word. The model looks at the probability distribution of most likely next words, and when the configuration parameter called "temperature" is 0 (and it is actually not possible to set to 0 in the GUI), there is no random influence, and strictly the most likely next word (top-1 MLE) will always be chosen. This leads to output that we would classify as "very boring".
So the model knows nothing about IBM, PS/2, 80286 versus 80486, CPUs, 280 or any models per se. -- One of the answers seems to suggest that there is no model 280, I wonder whether that one was generated through another process (there is a way to incorporate user feedback via "reinforcement learning"), or whether that was a consequence of the same randomized next-word picking, just a more lucky attempt.