Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, although it's also possible that the most likely token is incorrect and perhaps the next 4 most likely tokens would lead to a correct answer.

For example if you ask a model what is 0^0, the highest probability output may be "1", which is incorrect. The next most probable outputs may be words like "although", "because", "Due to", "unfortunately", etc. as the model prepares to explain to the user that the value of the expression is undefined; because there are many more ways to express and explain the undefined answer than there are to express a naively incorrect answer, the correct answer is split across more tokens so that even if eg the softmax value of "1" is 0.1 and across "although"+"because"+"due to"+"unfortunately">0.3, at temperature of 0, "1" gets chosen. At slightly higher temperatures, sampling across all outputs would increase the probability of a correct answer.

So it's true that increasing the temperature increases the probability that the model outputs tokens other than the single-most-likely token, but that might be what you want. Temperature purely controls the distribution of tokens, not "answers".



Not sure if you were making a joke, but 0^0 is often defined as 1.

https://en.wikipedia.org/wiki/Zero_to_the_power_of_zero


I honestly had forgot that, if I ever knew it. But I think the point stands that in many contexts you'd rather have the nuances of this kind of thing explained to you - able to represented by many different sequences of tokens, each individually being low probability - instead of simply taking the single-highest probability token "1".


I'd rather it recognize it should enter a calculator mode to evaluate the expression, and then can give context with the normal GPT behavior


perhaps a hallucination




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: