Hacker News new | past | comments | ask | show | jobs | submit login

The “surprising gaps” are precisely because they’re not reasoning—or, at least, not “reasoning” about the things a human would be to solve the problems, but about some often-correlated but different set of facts about relationships between tokens in writing.

It’s the failure modes that make the distinction clearest.

LLM output is only meaningful, in the way we usually mean that, at the point we assigned external, human meaning to it, after the fact. The LLM wouldn’t stop operating or become “confused” if fed gibberish, because the meaning it’s extracting doesn’t depend on the meaning humans assign things, except by coincidence—which coincidence we foster by feeding them with things we do not regard as gibberish, but that’s beside the point so far as how they “really work” goes.




But you also conveniently ignore the success modes where the answer is too novel to be anything other than reasoning.

The op clearly said LLMs reason so your opinion is totally against and opposed to the opinion of every author of that academic paper.

Why aren’t you condemning this paper?


> or, at least, not “reasoning” about the things a human would be to solve the problems

You can't actually infer that either. Humans have considerable context that LLMs lack. You have no basis to infer how a human would reason given the same context as an LLM, or vice versa.


I don't think a human could effectively "reason" after being trained on nonsense (I don't think the training would even take). I think believing generative AI is operating on the same kind of meaning we are is a good way to be surprised when they go from writing like a learned professor for paragraphs to suddenly writing in the same tone and confidence but entirely wrong and with a bunch of made-up crap—it's all made-up crap from their perspective (if you will), we've just guided them into often making up crap that correlates to things we, separately from them, regard (they don't "regard", not in this sense) as non-crap.

[EDIT] To put it another way: if these things were trained to, I dunno, generate strings of onomatopoeia animal and environmental noises, I don't think anybody would be confusing what they're doing with anything terribly similar to human cognition, even if the output were structured and followed on from prompts reasonably sensibly and we were able to often find something like meaning or mood or movement or locality in the output—but they'd be doing exactly the same thing they're doing now. I think the form of the output and the training sets we've chosen are what're making people believe they're doing stuff significantly like thinking, but it's all the same to an LLM.


> I don't think a human could effectively "reason" after being trained on nonsense (I don't think the training would even take)

Go talk to a flat Earther or other religious zealot.

> I think the form of the output and the training sets we've chosen are what're making people believe they're doing stuff significantly like thinking, but it's all the same to an LLM.

Yes, but your mistake is asserting that thinking is not just following a set of patterns that lead from premises to conclusions, but that's literally what deductive logic is.

Let me put it this way, your argument is basically saying that computers can't reproduce human thinking because they can be programmed to output all kinds of nonsense that clearly isn't thinking (or at least, this is what it seems like you're saying). Well sure, but if they're programmed to actually think like humans, which should theoretically be possible given the Bekenstein Bound and the Church-Turing thesis, then clearly they are reproducing human thinking despite the fact that they can also be programmed to produce nonsense.

So the core question is whether artficacts of human thinking, like textbooks, poetry, etc. are sufficient for a learning system that can theoretically learn to reproduce arbitrary functions, to learn to reproduce human thinking. So rather than the training sets being some kind of red herring that are confusing people into concluding that LLMs are thinking, they are in fact central to the whole argument for why LLMs might actually be thinking!

We probably agree that LLMs don't have the same understanding of meaning that humans do, but I'm closer to the position that this is because they haven't been exposed to the same datasets we have, and not necessarily because their fundamental operation is so different. I think fundamental operation is probably a red herring because of Turing equivalence.


> We probably agree that LLMs don't have the same understanding of meaning that humans do

I think this is absolutely key, because to the extent LLM output has meaning we care about, I think it's effectively all supplied by us after the fact. This doesn't mean LLMs can't do interesting and useful things, but I consider some current maximalist takes on the state and immediate likely future of AI as doing something a couple steps up the complexity-chain from believing a pocket calculator that solves "1 + 5" must understand what "6" means to a human. That "6" is just some glowing dots until we assign it meaning and context that the pocket calculator doesn't and cannot have, even though it's really good at solving and displaying the results of calculations.

This model explains, I think, the actual experience of using an LLM better than the model I gather some have, which is that they're doing something pretty close to thinking but just get stuff wrong sometimes, as a human gets stuff wrong sometimes (I think they get things wrong differently from how a human does, and it's because they aren't working with the same kind of meaning we are). I think it's the familiar form of the output that's leading people down this way of thinking about what they're doing, and I think it's wrong in ways that matter, both for policy purposes (pleading that they're just doing what humans do to learn and then produce output from what they learned, when it comes to building them with copyright-protected data, falls rather flat with me, for example—I'm not quite to the point of entirely dismissing the argument, but I'm pretty close to sticking that in the "not even wrong" bucket, in part because of my perspective on how they work) and for actually working productively with generative AI. When these programs fail, it's usually not the way a human does, and using heuristics for recognizing places you need to be cautious when dealing with human output will result in mistakes. "Lies" often looks a lot like "truth" with them and can come out of nowhere, because that's not quite what they deal in, not the way humans do. They don't really lie but, crucially, they also don't tell the truth. But they may produce output that contains information that's wrong or correct, and takes a form that's very useful, or not very useful.

> but I'm closer to the position that this is because they haven't been exposed to the same datasets we have, and not necessarily because their fundamental operation is so different.

I'm not super far from agreeing with this, I think, but also think there's probably some approach (or, I'd expect, set of approaches) we need to layer on top of generative AI to make it do something that I'd consider notably close to human-type thinking, in addition to just being able to poke it and make text come out. I think what we've got now are, in human terms, something like severely afflicted schizophrenics with eidetic memories, high levels of suggestibility, and an absence of ego or self-motivation, which turns out to be pretty damn useful things to have but isn't necessarily something we'll get broadly human-level cognition (or better—I mean, they're already better than a lot of people at some tasks, let's face it, zero people who've ever lived could write bad satirical poetry as fast as an LLM can, much as nobody can solve square roots as fast as a pocket calculator) out of if we just do more of it—I doubt that the basic components needed to bridge that gap are present in the current systems at all. I expect we'll see them fail less as we feed them more energy and data, but for their failures to continue to look alien and surprising, always due to that mismatch between the meaning we're assigning to what they're doing, and their internal sense of "meaning", which are correlated (because we've forced them to be) but not dependent on one another in some necessary way. But yes, giving them more sources of "sensory" input and a kind of will, with associated tools, to seek out more input, is likely the direction we'll need to go to make them capable of more things, rather than just somewhat better at what they do now.

[EDIT] As for why I think our ways of discussing how these work matters, aside from the aforementioned reasons, it's that lay-people are taking our lead on this to some extent, and when we come out acting like these are thinking agents in some serious sense (or cynically promoting them as super dangerous and close to becoming real conscious entities on the verge of being insanely "smart" because, gee would you look at that, we're selling the things—ahem, paging Altman) it's a recipe for cooking up harmful misuse of these tools and of laws and policy that may be at-odds with reality.


But how can you be sure? You talk with confidence as if evidence exists to prove what you say but none of this evidence exists. It’s almost as if you’re an LLM yourself making up a claim with zero evidence. Sure you have examples that correlate with your point but nothing that proves your point.

Additionally there exists LLM output that runs counter to your point. Explain LLM output that is correct and novel. There exists correct LLM output on queries that are so novel and unique they don’t exist in any form in the training data. You can easily and I mean really easily make an LLM produce such output.

Again you’re making up your answer here without proof or evidence which is identical to the extrapolation the LLM does. And your answer runs counter to every academic author on that paper. So what I don’t understand from people like you is the level of unhinged confidence that runs border to religion.

Like you were talking about how the wrongness of certain LLM output make the distinction clearest while obviously ignoring the output that makes it unclear.

It’s utterly trivial to get LLMs to output things that disprove your point. But what’s more insane is that you can get LLMs to explain all of what’s being debated in this thread to you.

https://chatgpt.com/share/674dd1fa-4934-8001-bbda-40fe369074...


I ignored your other response to me because I didn't see anything in the abstract that contradicted my posts, but maybe there's something deeper in the paper that does. I'll read more of it later.

I think, though, the disconnect between us is that I don't see this:

> Explain LLM output that is correct and novel.

As something I need to do for my position to be strong. It would be if I'd made different claims, but I haven't made those claims. I can see parts of the paper's abstract that would also be relevant and tough to deal with if I'd made those other claims, so I'm guessing those are the parts you think I need to focus on, but I'm not disputing stuff like (paraphrasing) "LLMs may produce output that follows the pattern of a form of reasoned argument they were trained on, not just the particulars" from the abstract. Sure, maybe they do, OK.

I don't subscribe to (and don't really understand the motivation for) claims that generative AI can't produce output that's not explicitly in their training set, which is a claim I have seen and (I think?) one you're taking me as promoting. Friggin' Markov chain text generators can, why couldn't LLMs? Another formulation is that everything they output is a result of what they were trained on, which is stronger but only because it's, like, tautologically true and not very interesting.


It’s ok if you ignore it. I don’t mind.

Your claim is LLMs can’t reason.

And you say you don’t have to explain why LLMs output novel and correct answers. Well I asked for this because the fact that LLMs output correct and novel answers disproves your point. I disproved your point. Think about it. You think I introduced some orthogonal topic but I didn’t. I stated a condition that the LLM meets that disproves your claim.

So if there exists a prompt and answer pair such that the answer is so novel the probability of it being a correlation or random chance is extraordinarily low then your point is trivially wrong right?

Because if the answer wasn’t arrived by some correlative coincidence which you seem to be claiming then the only other possible way is reasoning.

Again such question and answer pairs actually exist for LLMs. They can be trivially generated like my shared link above which it talks about the entire topic of this sub thread without training data to support it.

Humans fail at reasoning all the time yet we don’t say humans can’t reason. So to keep consistency for the criterion we use for humans. If the LLM reasoned it means it can reason even if it clearly gives the wrong answer sometimes.

Additionally you likely claim all humans can reason. What’s your criterion for that? When you look at a human sometimes it outputs correct and novel answers that are not part of its training data (experiences).

It’s literally the same logic but you subconsciously move the goal posts to be much much higher for an LLM. In fact under this higher criterion all mentally retarded humans and babies can’t reason at all.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: