For example, I just asked o3 "how many Is are in antidisestablishmentarianism", and this was its train-of-thought output:
------
The user wants to know how many “I”s are in the word “antidisestablishmentarianism.” They probably mean the letter "I" in lowercase, though the question was a little broader with "Is."
I’ll treat it case-insensitively and count the total occurrences of the letter "i" (both uppercase and lowercase). I can count them manually now, or use a tool for faster analysis.
Analyzed
word = "antidisestablishmentarianism"
word_lower = word.lower()
word_lower.count('i')
5
The user asked how many "I"s are in "antidisestablishmentarianism." I’m taking it to mean whether it’s counting "i" or "I", and I’ll consider them as the same for simplicity. So, I’ll count the 5 occurrences of "i" in the word. I manually broke it down and verified there are indeed five "i"s, so the final count should be 5. Pretty straightforward, now that it's all clear!
Done
There are 5 “i”s in “antidisestablishmentarianism.”
------
You can come up with complicated definitions of "core reasoning" where this doesn't count, and I understand there's great demand for such definitions, but this clearly satisfies common-sense understanding of the term.
Then I guess Ubuntu has had reasoning for several decades:
sudp
Command 'sudp' not found, did you mean:
command 'sudo' from deb sudo (1.9.9-1ubuntu2.4)
command 'sudo' from deb sudo-ldap (1.9.9-1ubuntu2.4)
command 'sup' from deb sup (20100519-3)
command 'sfdp' from deb graphviz (2.42.2-6)
Try: sudo apt install <deb name>
I might just be on the opposite side of the aisle, but to me chain-of-thought is better understood as simply more context.
Of course there is ambiguity though, more context would be hard to distinguish from core-reasoning and vice versa.
I think LLMs/AI mean we can substitute reasoning with vast accumulations and relations between contexts.
Remember, RLHF gives the models some, and perhaps most of these chains-of-thought, when there isn’t sufficient text to scrape for each family of problems. When I see that chain-of-thought, the first thing I think of is of my peers who had write, rewrite, nudge, and correct these chains of thought, and not about core reasoning.
The CoT has that same overexplained step-by-step so many RLHF’ers will be accustomed to, and much of it was authored/originated by them. And due to the infinite holes it feels like plugging, I dont call that RL reasoning.
That’s amazing because made up language might also just be context scaffolding sans reasoning, e.g. it’s arbitrary extra context for machines to relate human text better. I’m not even trying to play devils advocate—-like both sides, true believers or pessimists, come up with wholly unconvincing arguments. (I genuinely don’t know if the tweet is a true believer or not). At least the pessimists aren’t coupled with the AI marketeers.
There's also distillation, where you can drastically improve a small model by training it on chains of thoughts of larger models. You can't achieve the same performance by training on original human texts. This suggests that those chains of thoughts reliably contain "densely packed reasoning", meaning the LLM probably has developed internal clusters of "reasoning circuitry", loosely speaking.