Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's because any expectation of GPT being subjectively or logically correct is ill-founded.

GPT does not model subjects. GPT does not even model words! It models tokens.

The structure of GPT's model is semantic, not logical. It's a model of how each token in the text that is present in GPT's training corpus relates to the rest of the tokens in that text.

The correct answer to a familiar logic problem just happens to be the text that is already present in the corpus. The answer GPT gives is the text from GPT's model that is semantically closest to the text in your prompt.

Knowing that, it is no longer a mystery how GPT "gets confused": the text in your "misleading prompt" was still semantically closest to the familiar answer.

The result is subjectively and logically wrong, because subjects and logic were never involved in the process!

In order to resolve this, ChatGPT's training corpus needs to contain a "correct answer" next to every unique permutation of every question. We can't expect that to be the case, so we should instead expect GPT to generate false, yet familiar, responses.



> In order to resolve this, ChatGPT's training corpus needs to contain a "correct answer" next to every unique permutation of every question.

This is not quite the right understanding of how ChatGPT works. It's not necessary to show ChatGPT an example of every possible permutation of an animal crossing puzzle in order for it to solve one it has never seen before. That's because the neural network is not a database of recorded word probabilities. It can instead represent the underlying logic of the puzzle, the relationships between different animals and using this abstract, pared down information, extrapolate the correct answer to the puzzle.

I see the failure in the example with the goat the lion and the cabbage as simply a matter of overfitting.

Edit: I see a lot of people saying "it doesn't understand logic; it's just predicting the next word."

I'm basing my understanding on this video:

https://youtu.be/viJt_DXTfwA

The claim is that it would be impossible to feed enough input into a system such that it could produce anything as useful as ChatGPT unless it was able to abstract the underlying logic from the information provided. If you consider the he number of permutations of the animal crossing puzzle this quickly becomes clear. In fact it would be impossible for ChatGPT to produce anything brand new without this capability.


I think what they mean by "resolve this" is "make it error-free". Your claim that "it isn't necessary to show every permutation for it to solve one it hasn't seen before" doesn't really contradict their point.

For puzzles whose entire permutation space is semantically similar enough, your claim is likely true. But for puzzles whose permutations can involve more "human" semantic manipulations, there is likely a much higher risk of failure.


Yes I think it depends on how you definite permutations for this puzzle. For example, if you limit your goal to training GPT to solve puzzles of the form where there only ever 3 distinct real animals, then my claim is that you wouldn't need to feed it examples of this puzzle with every single permutation of 3 different animals (assuming 10000 different animals that is already over 100bn permutations) before the neural network developed an internal logical model that can solve the puzzle as well as a human. It would only need a few descriptions of each animal plus a few examples of the puzzle to understand the logic.

If you mean to say that the permutations of the puzzle extend to changing the rules such as "if it's the Sabbath then reptiles can't travel" then sure it would require more representative examples and may never meet your standard of "error free" but I would also argue the same applies to humans when you present them a logic puzzle that is new to them.


> you wouldn't need to feed it examples of this puzzle with every single permutation

No, but you would need "enough"; whatever that number happens to be.

> It would only need a few descriptions of each animal plus a few examples of the puzzle to understand the logic.

That's the mistake.

GPT itself can't combine those two things. That work has to be done by the content of the already-written training corpus.

And the result is not the same as "understanding logic". It doesn't model the meaning of the puzzle: it models the structure of examples.

GPT can't distinguish the meaning of rules. It can only follow examples. It can't invent new strategies, it can only construct new collections of strategy parts; and it can only pick the parts that seem closest, and put those parts into a familiar order.

GPT doesn't play games, it plays plays.


> GPT does not model subjects. GPT does not even model words! It models tokens.

The first and last layers of a transformer decoder model tokens. The hidden layers don't have this restriction. There was a paper recently showing that the hidden layers actually perform mesa-optimization via something like backprop. There's absolutely no reason to believe they are not capable of world modeling. In fact all evident suggests they do, in fact, do world modeling.


The model is implicit, not explicit.

GPT is making boundaries around words because that is the pattern it is looking at.

If I feel the bumps in the fabric of my blanket, I will probably think the pattern of bumps at a certain scale is significant, but I won't have magically learned about threads or stitching!

Words are the most obvious pattern in written text. GPT models that pattern, but it does not recognize it as "words". It's just a pattern of tokens.

GPT models every pattern it can find. Most of these patterns are destined to fit the same boundaries as grammar rules: the example text was originally organized with grammar rules!

GPT can even recognize complex patterns like "it" substitution and question-answer dialogues, but it can never categorize them as such. It only knows "what" the pattern is: never "why".

The patterns that people use when writing have symbolic meaning. The subjective importance of each pattern is already known by the person writing.

Those patterns don't go anywhere. GPT's model is bound to find and replicate them.

Here's the problem: some patterns have ambiguous meaning. There is no semantic difference between a truth and a lie. Without interpreting the symbolic meaning and applying logic, there is no way to distinguish between the two: they are the same pattern.


This pov ignores a lot of the emergent theory of mind and world model building research that suggests LLMs may possess a form of rudimentary reasoning ability.

https://www.lesswrong.com/posts/sbaQv8zmRncpmLNKv/the-idea-t...


> GPT does not model subjects. GPT does not even model words! It models tokens.

Someone hasn't read the Othello GPT work out of Harvard a few months back...


"Emergent World Representations"

The weasel word here is "emergent". That means they are implicit representations.

The representations of the Othello board that exist in that model are not explicitly constructed. They just happen to align with the model that a person playing Othello would likely represent the game with.

That work showed that, given an example sequence of valid Othello game states (as training corpus) and a valid "fresh" Othello game state (as a prompt), the system can hallucinate a sequence of valid Othello game states.

The system does not know what Othello is, what a turn is, or what playing is. It only has a model of game states progressing chronologically.

When we look objectively at that model, we can see that it aligns closely to the game rules. Of course it does! It was trained on literally nothing else. A valid Othello game progression follows those rules, and that is what was provided.

But the alignment is imperfect: some prompts hallucinate invalid game progressions. The model is not a perfect match for the explicit rules.

In order for all prompts to result in valid progressions, the training corpus must have enough examples to disambiguate. It doesn't need every example: plenty of prompts will stumble into a valid progression.

The next thing to recognize: a "valid" progression isn't a "strategic" progression. These are being constructed from what is known not what is chosen. Given a constrained set of Othello strategies in the example corpus, the system will not diverge from those strategies. It won't even diverge from the example strategies when the rules of Othello demand it.

GPT doesn't play the game. It plays the plays.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: