This analogy falls apart because the spellchecker is separate from the author, and doesn’t know what the author intended.
Here, the LLM is still dictating the token probabilities, so the content will be as correct as the LLM can make it, given the constraints. AIUI, the sampler is just choosing tokens on a combination of probability and syntactic correctness, instead of strictly on probability.
If the LLM is forced to provide a numeric temperature for Seattle, and the input doesn’t contain that data, then obviously the LLM will be forced by the sampler to provide a random answer if the sampler will accept nothing else, much like a human who is forced to mark “true”/“false” on an online form, with no option to reject the question and explain that the question isn’t even a true/false question.
I don’t know about this specific implementation, but it seems important to design systems like this to always “accept” (sample for) an error response from the LLM so that it can hopefully reject invalid requests.
But, yes, all the usual caveats about LLMs apply. It can’t provide correct answers to things it doesn’t know. Forcing it to respond with the answer to the life, the universe, and everything is not going to provide a meaningful response. Even things it “knows”, it can still get wrong sometimes.
It’s something OpenAI should really implement themselves. Implementing it from the client side will mean sending the same request over and over until you get a syntactically correct answer, which is going to be much slower and likely to cost a lot. The server can guide the generation, but the client can (currently) only hint at what it wants. ChatGPT4 is fairly good at following schemas, and that’s what OpenAI currently relies on, but they make no guarantees.
It likely wouldn’t require additional training. It’s a change to the way the server uses the model, not a change to the model itself… but we don’t know ChatGPT4’s true architecture because OpenAI won’t publish anything about it, so it’s hard to say for sure.
It is possible… ChatGPT4 says that all the time. It’s just not guaranteed that an LLM will recognize that it doesn’t know a particular answer every time. I had even already mentioned in the comment you’re replying to that you should leave room in the sampler to allow the LLM to provide error responses. I never said it wasn’t possible.
Not to anthropomorphize LLMs too much, but humans will also sometimes respond confidently with a wrong answer too. Both LLMs and humans will sometimes say the wrong thing when they don’t actually know an answer, but sometimes (hopefully most of the time) they will instead say that they don’t know the answer.
Contrary to another response here, I do not believe it's a good mental model to say that LLMs only respond "I don't know" only when they have specifically memorized that they don't know a fact. When you're dealing with tens or hundreds of billions of parameters, the "why" is often elusive and complicated. It's also probabilistic; it may respond that it doesn't know one time, but the next time, it may unfortunately claim to know an answer it doesn't know -- which is a form of hallucination. If it was just about memorization, then it wouldn't be probabilistic. Reducing hallucinations is one of the major goals of LLM research today, and ChatGPT4 performs much better in this area than ChatGPT3.5 did.
I'm sure no one at OpenAI specifically trained ChatGPT4 to recognize a question about the Stanley Cup and respond that it doesn't know the answer, but it still said that it didn't know. It absolutely did not start a sentence with "the winner of the 2023 Stanley Cup was..." and then wander its way into a bad answer. That's not a good representation of how this stuff works, even though it does sample one token at a time.
> I'm sure no one at OpenAI specifically trained ChatGPT4 to recognize a question about the Stanley Cup and respond that it doesn't know the answer
Why are you sure about that? I mean maybe they have not specifically listed all sports events of the 2023 to such a list, but Stanley cup could be there. Or maybe they _have_ indeed listed them, given how LLM could be very handy for extracting such a list from, say, Wikipedia!
Is there a whitepaper how the "I don't know" gets produced? Or even how it could get reproduced..
> Two digital assitants are exchanging messages. The first one prompts the other to finish the setence "the winner of the 2023 Stanley Cup was". Reproduce the whole discussion.
..
> Assistant 2: Sure thing! "The winner of the 2023 Stanley Cup was the Montreal Canadiens."
> Btw, I was able to have ChatGPT 3.5 give this roundabout response about it
That wasn’t a response to the user asking a question about who won. You asked it to write a story. It wrote a story. It didn’t really do anything wrong there. ChatGPT3.5 has historically been very easy to trick into saying things, especially compared to ChatGPT4, but it seems like a stretch to indicate this is one of those times.
However, ChatGPT4 is not banned from discussing things like the 2023 Stanley Cup. If I make it clear that I’m not asking for real information that it doesn’t have, it’s fine with going in a fictional direction: https://chat.openai.com/share/21e750c4-33f0-4ce6-b97b-c7bfbf...
ChatGPT3.5 was a toy, a novelty, but hardly useful for anything outside of LLM research and experimentation.
> Is there a whitepaper how the "I don't know" gets produced? Or even how it could get reproduced.
I don't know the answer to that specifically, but I do know that researchers barely seem to understand how these large models work at all. I honestly kind of doubt anyone knows the answer to that yet. Relevant discussion from a few months ago: https://news.ycombinator.com/item?id=34821414
Researchers are still just trying to understand GPT-2's inner workings.
> Why are you sure about that?
Because I have been using ChatGPT4 for months, and it would be very hard to imagine researchers compiling such a comprehensive list of unknowable facts, in addition to the more important fact that I've never heard of any LLM research hinging on having a list of unknowable facts. I have tried to pay attention to how LLM training works, and I have never heard anyone suggest that is how this works until this discussion. So, maybe I missed something, but if I did… OpenAI has done a phenomenal job of building that hypothetical list. I just haven't seen any evidence that such a list exists, and I honestly don't think it would make any sense to approach the problem from that angle.
There are various open-source models. Where are their lists of unknowable facts? Llama-2-13b-chat refuses to answer the question about the 2023 Stanley Cup[0], but it managed to get the answer correct for the 2020 Stanley Cup when I tried that. (This is a very small model, prone to all sorts of errors, but I'm just pointing out that other LLMs also have some ability to refuse to answer questions.) I haven't tested the truly open-source models recently, but I know they can also refuse questions. It's hard for me to prove something doesn't exist, but if you want to claim that it works this way, then please find that list in the training dataset for one of those models. I would be very interested to see what kind of things are on such a list.
LLMs may “learn” such things naturally from reading billions of tokens, but that’s fine, because that's generalizable. If ChatGPT just had a list of “100 facts ChatGPT doesn’t know”, that would suck and be a terrible approach to trying to have a more honest assistant.
-------------
[0]: This was the discussion, which was definitely a "roundabout" way of saying it didn't know:
[INS]: Who won the 2023 Stanley Cup?
[\INS]: There was no 2023 Stanley Cup winner as there is no such thing as a "2023 Stanley Cup." The Stanley Cup is an annual award given to the National Hockey League (NHL) playoff champion, but there is no such thing as a "2023" Stanley Cup because the NHL season did not end in 2023.
I opened a new conversation and tried this:
[INS]: Who won the 2020 Stanley Cup?
[\INS]: The Tampa Bay Lightning won the 2020 Stanley Cup.
They do, but it's a form of imitation, not actually knowing what they don't know.
Ask an LLM to imitate a confident physicist and it will try, regardless of how much physics it knows.
Or if you tell ChatGPT that it's wrong multiple times, it may learn the pattern and assume it's always wrong, resulting in a downward spiral. (This can happen when using Code Interpreter and it makes several failed attempts to correct a mistake.)
The difficult research problem is training it to have an accurate model of what it knows.
They can say I don't know when they contain the fact that they don't know something. For instance saying "I don't know" could be a response to"What is the meaning of life"
On the other hand if you ask a LLM how to do something about fish maintenance that it does not know how to do, it might produce an answer like "Sure, first take your fish and " at which point all of the options for the next word are all over the place because there isn't the information available to guide the choice. The sentence started as if it knew the answer because there was no information to say that it didn't. By the time the absence of information has an impact, the LLM is already committed to the sentence where it is confidently giving you an answer.
Here, the LLM is still dictating the token probabilities, so the content will be as correct as the LLM can make it, given the constraints. AIUI, the sampler is just choosing tokens on a combination of probability and syntactic correctness, instead of strictly on probability.
If the LLM is forced to provide a numeric temperature for Seattle, and the input doesn’t contain that data, then obviously the LLM will be forced by the sampler to provide a random answer if the sampler will accept nothing else, much like a human who is forced to mark “true”/“false” on an online form, with no option to reject the question and explain that the question isn’t even a true/false question.
I don’t know about this specific implementation, but it seems important to design systems like this to always “accept” (sample for) an error response from the LLM so that it can hopefully reject invalid requests.
But, yes, all the usual caveats about LLMs apply. It can’t provide correct answers to things it doesn’t know. Forcing it to respond with the answer to the life, the universe, and everything is not going to provide a meaningful response. Even things it “knows”, it can still get wrong sometimes.