I think a lot of the confusion on whether LLMs can think stems from the fact that LLMs are purely models of language and solve intelligence as a kind of accidental side-effect.
The real problem that an LLM is trying to solve is to create a model that can enumerate all meaningful sequences of words. This is just an insane way of approaching the problem of intelligence on the face of it. There's a huge difference between a model of language and an intelligent agent that uses language to communicate.
What LLMs show is that the hardest problem - of how to get emergent capabilities at scale from huge quantities of data - is solved. To get more human-like thinking, all that is needed is to find the right pre-training task that more closely aligns with agentic behavior. This is still a huge problem but it's an engineering problem and not one of linguistic theory or philosophy.
What we feed these huge LLMs is not just language, but text. and an enormous amount of it. The transformer is an arbitrary sequence to sequence modeller.
Think about what is contained (explicitly and implicitly) in all the text we can feed a model. It's not just language, but a projection of the world as humans see it.
GPT-3.5 Instruct Turbo can play valid chess at about 1800 ELO, no doubt because of the chess games described in PGN in the training set. Does Chess suddenly become a language ability because it was expressed in Text ? No
Chess is a great example because it highlights the subtle difference between LLMs and agents. What GPT3.5 does is not quite playing chess but creating realistic chess moves that a human might make.
An LLM could play chess though, all it needs is grounding (by feeding it the current board state) and agency (RF to reward the model for winning games)
3.5 Instruct (different model from regular 3.5 that can't play) can play chess. There's no trick. Any other framing seems like a meaningless distinction.
The goal is to model the chess games and there's no better way to do that than to learn to play the game.
>all it needs is grounding (by feeding it the current board state)
The Model is already constructing a board state to play the game.
>agency (RF to reward the model for winning games)
Predict the next token loss is already rewarding models for winning when the side they are predicting wins.
And when the preceeding text says x side wins and it's playing as x side then the loss is rewarding it to do everything it can to win.
I agree different goals and primary rewards led to this ability to play and with it , slight manifestations(GPT can probably modulate level of play better than any other machine or human) but it is nonetheless playing.
The real problem that an LLM is trying to solve is to create a model that can enumerate all meaningful sequences of words. This is just an insane way of approaching the problem of intelligence on the face of it. There's a huge difference between a model of language and an intelligent agent that uses language to communicate.
What LLMs show is that the hardest problem - of how to get emergent capabilities at scale from huge quantities of data - is solved. To get more human-like thinking, all that is needed is to find the right pre-training task that more closely aligns with agentic behavior. This is still a huge problem but it's an engineering problem and not one of linguistic theory or philosophy.