It's 19 June 2020 and I'm reading Gwern's article on GPT3's creative fiction (ht...

gjm11 · 2025-07-07T00:48:43 1751849323

I think you're absolutely right that judging LLMs' "intelligence" on their ability to count letters is silly. But there's something else, something that to my mind is much more damning, in that conversation WaltPurvis reported.

Imagine having a conversation like that with a human who for whatever reason (some sort of dyslexia, perhaps) has trouble with spelling. Don't you think that after you point out New York and New Jersey even a not-super-bright human being would notice the pattern and go, hang on, are there any other "New ..." states I might also have forgotten?

Gemini 2.5 Pro, apparently, doesn't notice anything of the sort. Even after New York and New Jersey have been followed by New Mexico, it doesn't think of New Hampshire.

(The point isn't that it forgets New Hampshire. A human could do that too. I am sure I myself have forgotten New Hampshire many times. It's that it doesn't show any understanding that it should be trying to think of other New X states.)

lelanthran · 2025-07-07T23:23:21 1751930601

> I think you're absolutely right that judging LLMs' "intelligence" on their ability to count letters is silly.

I don't think it is silly; it's an accurate reflection that what is happening inside the black box is not at all similar to what is happening inside a brain.

Computer: trained on trillions of words, gets tripped up by spelling puzzles.

My five year old: trained on Distar alphabet since three, working vocab of perhaps a thousand words, can read maybe half of those and still gets the spelling puzzles correct.

There's something fundamentally very different that has emerged from the black box, but it is not intelligence as we know it.

gjm11 · 2025-07-09T00:23:57 1752020637

Yup, LLMs are very different from human brains, so whatever they have isn't intelligence as we know it. But ...

1. If the subtext is "not intelligence as we know it, but something much inferior": that may or may not be true, but crapness at spelling puzzles isn't much evidence for it.

2. More generally, skill with spelling puzzles just isn't a good measure of intelligence. ("Intelligence" is a slippery word; I mean something like "the correlation between skill at spelling puzzles and most other measures of cognitive ability is pretty poor". Even among humans, still more for Very Different things the "shape" of whose abilities is quite different from ours.)

lelanthran · 2025-07-09T08:29:52 1752049792

> 1. If the subtext is "not intelligence as we know it, but something much inferior": that may or may not be true, but crapness at spelling puzzles isn't much evidence for it.

I'm not making a judgement call on whether it is or isn't intelligence, just that it's not like any sort of intelligence we've ever observed in man or beast.

To me, LLMs feels more like "A tool with built-in knowledge" rather than "A person who read up on the specific subject"

I know that many people use the analogy of coding LLMs as "An eager junior engineer", but even eager junior engineers only lack knowledge. They can very well come up with something that they've never seen before. In fact, it's common for them to reinvent a code method or code mechanism that they've never seen before.

And that's only for coding, which is where 99.99% of LLM usage falls today.

This is why I say it's not intelligence as we define it, but it's certainly something even if it is not an intelligence we recognise.

It's not unintelligent, but it's not intelligent either. It's something else.

gjm11 · 2025-07-10T19:48:19 1752176899

Sure. But all those things you just said are about the AI systems' ability to come up with new ideas versus their knowledge of existing ones. And that doesn't have much to do with whether or not they're good at simple spelling puzzles.

(Some of the humans I know who are worst at simple spelling puzzles are also among the best at coming up with good new ideas.)

saulpw · 2025-07-07T17:04:54 1751907894

It even says at one point

> I've reviewed the full list of US states

So it's either incompetent when it reviews something without prompting, or that was just another bit of bullshit. The latter seems almost certainly the case.

Maybe we should grant that it has "intelligence", like we grant that a psychopath has intelligence. And then promptly realize that intelligence is not a desirable quality if you lack integrity, empathy, and likely a host of other human qualities.

imiric · 2025-07-06T23:49:24 1751845764

Let's ignore whatever BPE is for a moment. I, frankly, don't care about the technical reason these tools exhibit this idiotic behavior.

The LLM is generating "reasoning" output that breaks down the problem. It's capable of spelling out the word. Yet it hallucinates that the letter between the two 'A's in 'Hawaii' is 'I', followed by some weird take that it can be confused for a 'W'.

So if these tools are capable of reasoning and are so intelligent, surely they would be able to overcome some internal implementation detail, no?

Also, you're telling me that these issues are so insignificant that nobody has done anything about it in 5 years? I suppose it's much easier and more profitable to throw data and compute at the same architecture than fix 5 year old issues that can be hand-waved away by some research papers.

aeve890 · 2025-07-07T07:21:11 1751872871

Cool story bro, but where's your argument? What kind of intelligence is one that can't pass a silly test?