Not sure why everyone rates this. It’s full of very confidently made statements ...

grepLeigh · 2025-02-09T22:46:41 1739141201

LLMs that use Chain of Thought sequences have been demonstrated to misrepresent their own reasoning [1]. The CoT sequence is another dimension for hallucination.

So, I would say that an LLM capable of explaining its reasoning doesn't guarantee that the reasoning is grounded in logic or some absolute ground truth.

I do think it's interesting that LLMs demonstrate the same fallibility of low quality human experts (i.e. confident bullshitting), which is the whole point of the OP course.

I love the goal of the course: get the audience thinking more critically, both about the output of LLMs and the content of the course. It's a humanities course, not a technical one.

(Good) Humanities courses invite the students to question/argue the value and validity of course content itself. The point isn't to impart some absolute truth on the student - it's to set the student up to practice defining truth and communicating/arguing their definition to other people.

[1] https://arxiv.org/abs/2305.04388

ctbergstrom · 2025-02-09T22:52:34 1739141554

Yes!

First, thank you for the link about CoT misrepresentation. I've written a fair bit about this on Bluesky etc but I don't think much if any of that made it into the course yet. We should add this to lesson 6, "They're Not Doing That!"

Your point about humanities courses is just right and encapsulates what we are trying to do. If someone takes the course and engages in the dialectical process and decides we are much too skeptical, great! If they decide we aren't skeptical enough, also great. As we say in the instructor guide:

"We view this as a course in the humanities, because it is a course about what it means to be human in a world where LLMs are becoming ubiquitous, and it is a course about how to live and thrive in such a world. This is not a how-to course for using generative AI. It's a when-to course, and perhaps more importantly a why-not-to course.

"We think that the way to teach these lessons is through a dialectical approach.

"Students have a first-hand appreciation for the power of AI chatbots; they use them daily.

"Students also carry a lot of anxiety. Many students feel conflicted about using AI in their schoolwork. Their teachers have probably scolded them about doing so, or prohibited it entirely. Some students have an intuition that these machines don't have the integrity of human writers.

"Our aim is to provide a framework in which students can explore the benefits and the harms of ChatGPT and other LLM assistants. We want to help them grapple with the contradictions inherent in this new technology, and allow them to forge their own understanding of what it means to be a student, a thinker, and a scholar in a generative AI world."

globalnode · 2025-02-10T00:59:29 1739149169

I'll give it a read. I must admit, the more I learn about the inner workings of LLM's the more I see them as simply the sum of their parts and nothing more. The rest is just anthropomorphism and marketing.

maccaw · 2025-02-10T01:37:30 1739151450

Funny, I feel the same way about humans.

rockskon · 2025-02-10T04:56:29 1739163389

Whenever I see someone confidently making a comparison between LLMs and people, I assume they are unserious individuals more interested in maintaining hype around technology than they are in actually discussing what it does.

williamcotton · 2025-02-10T07:13:51 1739171631

Someone saying "they feel" something is not a confident remark.

Also, there's plenty of neuroscience that is produced by very serious researchers that have no problems making comparisons between human brain function and statistical models.

https://en.wikipedia.org/wiki/Bayesian_approaches_to_brain_f...

https://en.wikipedia.org/wiki/Predictive_coding

rockskon · 2025-02-10T07:50:59 1739173859

Theories and approaches to study are not rational bases for making comparisons between LLMs and the human brain.

They're bases for studying the human brain - something which we are very much in our infancy of understanding.

mr_toad · 2025-02-09T23:28:36 1739143716

Current LLMs are not the end-all of LLMs, and chain of thought frontier models are not the end-all of AI.

I’d be wary of confidently claiming what AI can and can’t do, at the risk of looking foolish in a decade, or a year, or at the pace things are moving, even a month.

ctbergstrom · 2025-02-09T23:40:44 1739144444

That's entirely true. We've tried hard to stick with general principles that we don't think will readily be overturned. But doubtless we've been too assertive for some people's taste and doubtless we'll be wrong in places. Hence the choice to develop not a static book but rather living document that will evolve with time. The field is developing too fast for anything else.

With respect to what the future brings, we do try to address a bit of that in Lesson 16: https://thebullshitmachines.com/lesson-16-the-first-step-fal...

mr_toad · 2025-02-10T00:32:44 1739147564

> we don't think will readily be overturned

I think that’s entirely the problem. You’re making linear predictions of the capabilities of non-linear processes. Eventually the predictions and the reality will diverge.

habinero · 2025-02-10T01:09:35 1739149775

There's no evidence to support that's the case.

pama · 2025-02-10T01:42:06 1739151726

Every time someone claimed “emerging” behavior in LLMs it was exactly that. I can probably count more than 100 of these cases, many unpublished, but surely it is easy to find evidence by now.

interstice · 2025-02-10T01:28:19 1739150899

Said the turkey to the farmer

falcor84 · 2025-02-10T11:42:59 1739187779

I don't think that's how that metaphor works.

interstice · 2025-02-13T23:40:07 1739490007

Not quite, but it was the closest pithy quote I could think of to convey the point that things can be false for a long time before they are suddenly true without warning.

habinero · 2025-02-16T07:09:48 1739689788

How about "Yes, they laughed at Galileo, but they also laughed at Bozo the Clown?"

We heard alllllll the same hype about how revolutionary the blockchain was going to be and look how that turned out.

It's a virtue to point out the emperor has no clothes. It's not a virtue to insist clothes tech is close to being revolutionary and if you just understand it harder, you'd see the space where the clothes go.

kykeonaut · 2025-02-09T23:50:44 1739145044

The post seems to be talking about the current capabilities of large language models. We can certainly talk about what they can or cannot do as of today, as that is pretty much evidence based.

dullcrisp · 2025-02-09T23:39:58 1739144398

They saw you coming in part 16.

beezlewax · 2025-02-10T08:23:39 1739175819

That shouldn't give them any more merit that their current iteration deserves.

You could say the same thing about spaceships or self diving cars.

onemoresoop · 2025-02-09T23:50:12 1739145012

The ground truth is chopped off into tokens and statistically evaluated. It is of course just a soup of ground truth that can freely be used in more or less twisted ways that have nothing to do or are tangent to the ground truth. While I enjoy playing with LLMs I don't believe they have any intrinsic intelligence to them and they're quite far from being intelligent in the same sense that autonomous agents such as us humans are.

whattheheckheck · 2025-02-10T00:06:57 1739146017

Any all of the tricks getting tacked on are overfitting to the test sets. It's all the tactics we have right now and they do provide assistance in a wide variety of economically valuable tasks with the only signs of stopping or slowing down is data curation efforts

pjs_ · 2025-02-10T16:26:13 1739204773

I've read that paper. The strong claim, confidently made in the OP is (verbatim) "they don’t engage in logical reasoning.".

Does this paper show that LLMs "don't engage in logical reasoning"?

To me the paper seems to mostly show that LLMs with CoT prompts (multiple generations out of date) are vulnerable to sycophancy and suggestion -- if you tell the LLM "I think the answer is X" it will try too hard to rationalize for X even if X is false -- but that's a much weaker claim than "they don't engage in logical reasoning". Humans (sycophants) do that sort of thing also, it doesn't mean they "don't engage in logical reasoning".

Try running some of the examples from the paper on a more up-to-date model (e.g. o1 with reasoning turned on) it will happily overcome the biasing features.

Lerc · 2025-02-10T12:04:04 1739189044

I think you'll find that humans have also demonstrated that they will misrepresent their own reasoning.

That does not mean that they cannot reason.

In fact, to come up with a reasonable explanation of behaviour, accurate or not, requires reasoning as I understand it to be. LLMs seem to be quite good at rationalising which is essentially a logic puzzle trying to manufacture the missing piece between facts that have been established and the conclusion that they want.

fmbb · 2025-02-10T00:11:21 1739146281

Training on all papers does not mean the model believes or knows the truth.

It is just a machine that spits out words.

joenot443 · 2025-02-10T05:30:17 1739165417

It's 1994. Larry Llyod Mayer has read the entire internet, hundreds of thousands of studies across every field, and can answer queries word for word the same as modern LLMs do. He speaks every major language. He's not perfect, he does occasionally make mistakes, but the sheer breadth of his knowledge makes him among the most employable individuals in America. The Pentagon, IBM, and Deloitte are begging to hire him. Instead, he works for you, for free.

Most laud him for his generosity, but his skeptics describe him as just a machine that spits out words. A stochastic parrot, useless for any real work.

yathaid · 2025-02-10T08:52:55 1739177575

Does his accuracy take a sudden precipitous fall when going from multiplying two three-digit numbers to two four-digit numbers?

falcor84 · 2025-02-10T11:44:55 1739187895

I don't know about you, but when I do math without a calculator, my accuracy also drops precipitously whenever they add a digit.

throwaway290 · 2025-02-10T12:02:02 1739188922

Do you have self awareness to anticipate the drop in your accuracy and refuse to perform the operation?

falcor84 · 2025-02-10T12:24:27 1739190267

I do anticipate it, but in the situations I'm asked to do such calculations, I don't usually have the option of refusing, nor would I want to. For most real would situations, it's generally better to arrive at a ballpark solution than to refuse to engage with the problem.

throwaway290 · 2025-02-10T13:27:59 1739194079

Ballpark solution is in a way refusing...

joenot443 · 2025-02-12T03:40:46 1739331646

In the very unserious hypothetical I'm describing, I'd say Lloyd's capabilities match that of GPT-4. In this case, he's not a calculator, but he is a decent programmer, so like GPT-4 he quickly runs the operation through a script, rather than trying to figure it out in his head.

https://chatgpt.com/share/67ac17df-fd9c-800d-9d3d-03c66b3e86...

"The result of 720947×263647 is 190,075,513,709."

pjs_ · 2025-02-10T21:35:03 1739223303

This is a solved problem, ChatGPT uses a python prompt to do arithmetic now. Just like you would… all good. You Can Just Check Your Own Claims

hnthrow90348765 · 2025-02-10T12:07:53 1739189273

It has some pieces of the puzzle to intelligence. That's a deal breaker for some people, and useful/promising to others.

lifthrasiir · 2025-02-10T01:12:54 1739149974

I would be very careful to claim exactly that as emergent properties seem kinda crucial for artificial and human intelligences. (Not to say that they are equally functioning nor useful.)

pjs_ · 2025-02-10T14:22:35 1739197355

What experiment or measurement could I do to distinguish between a machine that “knows” the truth and a machine that merely “spits it out”? I’m trying to understand your terminology here

Lerc · 2025-02-10T12:14:38 1739189678

Um... what truth?

My truth, your truth or some defined objective truth?

criley2 · 2025-02-10T02:37:11 1739155031

>Training on all papers does not mean the model believes or knows the truth. It is just a machine that spits out words.

Sounds like humans at school. Cram the material. Take the test. Eject the data.

nullc · 2025-02-10T00:39:45 1739147985

> I mean just try it yourself with o1, go as deep as you like asking how it arrived at a conclusion

I don't mean to disagree overall, but on this point the LLM can post-facto rationalize its output but it has no introspection and has absolutely no idea why it made a given bit of output (except in so far as it was a result of COT which it could reiterate to you). The set of weights being activated could be nearly disjoint when answering and explaining the answer.

One can also make the same argument about humans -- that they can't introspect their own minds and are just posthoc rationalizing their explanations unless their thinking was a product of an internal monolog that they can recount. But humans have a lifetime of self-interaction that gives a good reason to hope that their explanations actually relate to their reasoning. LLM's do not.

And LLMs frequently give inconsistent results, it's easy to demonstrate the posthoc nature of LLM's rationalizations too: Edit the transcript to make the LLM say something it didn't say and wouldn't have said (very low probability), and then have it explain why it said that.

(Though again, split brain studies show humans unknowingly rationalizing actions in a similar way)

lanstin · 2025-02-10T00:58:49 1739149129

I doubt people are very accurate at knowing why they made the choices they did. If you want them to recite a chain of reasoning they can but that is kind of far from most decision making most people do.

nullc · 2025-02-10T03:01:12 1739156472

I agree people aren't great at this either and my post said as much.

However we're familiar with the human limits of this and LLMs are currently much worse.

This is particularly relevant because someone suffering from the mistaken belief that LLM's could explain their reasoning might go on to attempt to use that to justify the misapplication of an LLM.

E.g. fine tune some LLM using resume examples so that it almost always rejects Green-skinned people, but approve the LLMs use in hiring decisions because it is insistent that it would never base a decision on someone's skin color. Humans can lie about their biases of course, but a human at least has some experience with themselves while a LLM usually has no experience observing themself except for the output visible in their current window.

nullc · 2025-02-10T03:04:52 1739156692

I also should have added that the ability to self explain when COT was in use only goes as deep as the COT, as soon as you probe deeper such that the content of the COT requires explanation the LLM is back in the realm of purely making stuff up again.

A non-hallucinated answer could only recount the COT and beyond that it would only be able to answer "Instinct."-- sure the LLM's response has reasoning hidden inside it, but that reasoning is completely inaccessible to the LLM.

radioactivist · 2025-02-10T05:14:13 1739164453

I've had frontier reasoning models (or at least what I can access in ChatGPT+ at any given moment) give wildly inconsistent answers when asked to provide the underlying reasoning (and the CoT weren't always given). Inventing sources and then later denying them mentioned them. Backtracking on statements it claimed to be true. Hiding weasel words in the middle of a long complicated argument to arrive at whatever it decided the answer was. So I'm inclined to believe the reasoning steps here are also susceptible to all the issues discussed in the posted article.

MichaelZuo · 2025-02-10T05:54:25 1739166865

This sounds similar to a median human with little scruples?

randomNumber7 · 2025-02-10T08:50:38 1739177438

> “can’t explain how they arrived at conclusions”

Imagine I would tell my wife, that whenever we have a discussion, her opinion would only be valid when she can explain how she arrived at her conclusion.

oblio · 2025-02-10T10:43:03 1739184183

Your wife is one of the end products of cutthroat competition across several billion years so let's just say her general intelligence has a fair bit more validation than 20 years of research.

falcor84 · 2025-02-10T11:50:15 1739188215

Sexual selection applies an evolutionary pressure against men who challenge women too much about the validity of their reasoning.

oblio · 2025-02-10T20:42:07 1739220127

I was really, really trying to ignore the casual misogyny in OP's comment but you're really making this hard.

falcor84 · 2025-02-10T21:16:51 1739222211

Well, for what it's worth, I believe that this evolutionary pressure works as strongly, or even more so, against women who challenge men about the validity of their reasoning.

poulpy123 · 2025-02-10T11:54:00 1739188440

But we know how the LLM works, and that's exactely how the authors explain it. And that explain also the weird mistakes they do, that nothing with the ability of reason or having a ground truth would do.

I really do not understand how technical people can think they are sentient

bwfan123 · 2025-02-09T22:30:49 1739140249

the machine is fooling you with a mimicry of reasoning. and you are falling for it.

Grimblewald · 2025-02-09T22:40:18 1739140818

If it's mimicry of reason is indistinguishable from real reasoning, how is it not reasoning?

Ultimately, an LLM models language and the process behind it's creation to some degree of accuracy or another. If that model includes a way to approximate the act of reasoning, then it is reasoning to some extent. The extent I am happy to agree is open for discussion, but that reasoning is taking place at all is a little harder to attack.

onemoresoop · 2025-02-09T23:52:53 1739145173

No, it is distinguishable from real reasoning. Real reasoning, while flawed in various ways, goes through personal experience of the evaluator. LLMs don't have that capability at all. They're just sifting though tokens and associate statistical parameters to it with no skin in the game so to speak.

Grimblewald · 2025-02-10T03:31:05 1739158265

LLM's have personal option by virtue of the fact they make statements of things they understand to the extent their training data allows. Their training data is not perfect, and in addition, through random chance the LLM will latch onto specific topics as a function of weight initialization and training data order.

This would form a filter not unlike, yet distinct from, our understanding of personal experience.

you could make the exact same argument against humans, we just learn to make sounds that elicit favourable responses. Besides, they have plenty "skin in the game", about the same as you or I.

danenania · 2025-02-10T00:07:59 1739146079

It seems like an arbitrary distinction. If an LLM can accomplish a task that we’d all agree requires reasoning for a human to do, we can’t call that reasoning just because the mechanics are a bit different?

Barrin92 · 2025-02-10T01:34:39 1739151279

Yes because it isn't an arbitrary distinction. My good old TI-83 can do calculations that I can't even do in my head but unlike me it isn't reasoning about them, that's actually why it's able to do them so fast, and it has some pretty big implications about what it can't do.

If you want to understand where a systems limitations are you need to understand not just what it does but how it does it, I feel like we need to start teaching classes on Behaviorism again.

danenania · 2025-02-10T02:36:39 1739154999

An LLM’s mechanics are algorithmically much closer to the human brain (which the LLM is modeled on) than a TI-83, a CPU, or any other Turing machine. Which is why, like the brain, it can solve problems that no individual Turing machine can.

Are you sure you aren’t just defining reasoning as something only a human can do?

meroes · 2025-02-10T03:06:12 1739156772

My prior is reasoning is a conscious activity. There is a first person perspective. LLMs are so far removed mechanically from brains the idea they reason is not even remotely worth considering. Modeling neurons can be done with a series of pipes and flowing water, and that is not expected to give arise to consciousness either. Nor are nuerons and synapses likely to be sufficient for consciousness.

You know how we insert ourselves into the process of coming up with a delicious recipe? That first person perspective might be also necessary for reasoning. No computer knows the taste of mint, it must be given parameters about it. So if a computer comes up with a recipe with mint, we know it wasn’t via tasting anything ever.

A calculator doesn’t reason. A facsimile of something we have no idea about its role in consciousness has the same outlook as the calculator.

williamcotton · 2025-02-10T07:20:32 1739172032

LLMs are so far removed mechanically from brains the idea they reason is not even remotely worth considering.

Jet planes are so far removed mechanically from a bird that the idea they fly is not even remotely worth considering.

meroes · 2025-02-10T19:21:20 1739215280

You’re right that my argument depends upon there being a great physical distinction between brains and H100s or enough water flowing through troughs.

But since we knew properties of wings were major comments to flight dating back to beyond the myths of Pegasus or Icarus, we rightly connected the similarities in the flight case.

Yet while we have studied neurons and know the brain is apart of consciousness, we don’t know their role in consciousness like the wing’s for flight.

If you got a bunch if daisy chained brains and that started doing what LLMs do, I’d change my tune—because the physical substrates are now similar enough. Focusing on neurons, and their facsimilized abstractions, may be like thinking flight depending upon the local cellular structure of a wing, rather than the overall capability to generate lift, or any other false correlation.

Just because an LLM and a brain get to the same answer, doesn’t mean they got there the same way.

williamcotton · 2025-02-10T21:25:14 1739222714

Motte? Consciousness.

Bailey? Reason.

How reasonable are the outputs of ANNs considering the inputs? This is a valid question and it has a useful response.

From ImageNet to LLMs we are finding these tools to give some scale of a reasonable response.

Recommended reading: Philosophical Investigations by Wittgenstein.

danenania · 2025-02-10T03:24:06 1739157846

Are we then conferring some kind of supernatural or religious properties to the brain’s particular implementation of neurons?

If not, then why shouldn’t differently constructed but algorithmically similar systems be able to produce similar phenomena?

oblio · 2025-02-10T10:56:40 1739185000

Because we know practically nothing about brains so comparing them to LLMs is useless and nature is so complex that we're constantly discovering signs of hubris in human research.

See C-sections versus natural birth. Formula versus mother's milk. Etc.

Grimblewald · 2025-02-10T03:56:06 1739159766

I think you'd benefit from reading Helen Keller's autobigoraphy "the world i live in", you might reach the same conclusions I did, this being that perhaps conciousness is flavoured by our unique way of experiencing our world, but not strictly neccesary for conciousness of some kind or another to form. I beleive conciousness to be a tool a sufficently complex neural network will develop in order for it to achieve whatever objective it has been given to optimize for.

Lerc · 2025-02-10T12:35:01 1739190901

Taking a different tack from others in this thread. I don't think you can say that a TI-83 is not reasoning if it is doing calculations. Certainly it is not aware of any concepts of numbers and has no meaningful sense of the operation, but those are attributes of sentience, not reasoning. The reasoning ability of a calculator is extremely limited but what make those capabilities that it does have, non reasoning.

What non-sentience based property do you think something should have to be considered reasoning. Do you consider sentience and reasoning to be one and the same? If not then you should be able to indicate what distinguishes one from the other.

I doubt anyone here is arguing that chatGPT is sentient, yet plenty accept that it can reason to some extent.

Barrin92 · 2025-02-10T23:54:24 1739231664

>Do you consider sentience and reasoning to be one and the same?

No, but I think they share some similarities. You can be sentient without doing any reasoning, just through experience, there's probably a lot of simple life forms in that category. Where they overlap I think, is in that they require a degree of reflection. Reasoning I'd say is the capacity to distinguish between truth and falsehoods, to have mental content of the object you're reasoning about and as a consequence have a notion of understanding and an interior or subjective view.

The distinction I'd make is that calculation or memorization is not reasoning at all. My TI-83 or Stockfish can calculate math or chess but they have no notion of math or chess, they're basically Chinese rooms, they just perform mechanical operations. They can appear as if they reason, even a chess engine purely looking up results in a table base and with very simplistic brute force can play very strong chess but it doesn't know anything about chess. And with the LLMs you need to be careful because the "large" part does a lot of work. They often can sound like they reason but when they have to explain their reasoning they'll start to make up obvious falsehoods or contradictions. A good benchmark if something can reason is probably if it can.. reason about its reasoning coherently.

I do think the very new chain-of-thought models are more of a step into that direction, the further you get away from relying on data the more likely you're building something that reasons but we're probably very early into systems like that.

pjs_ · 2025-02-10T15:57:02 1739203022

You say they are distinguishable. How would you experimentally distinguish two systems, one of which "goes through personal experience" and therefore is doing "real reasoning", vs one which is "sifting through tokens and associating statistical parameters"? Can you define a way to discriminate between these two situations?

llm_trw · 2025-02-10T01:07:43 1739149663

>goes through personal experience of the evaluator

Real reasoning is being able to manipulate symbolic expressions in a consistent manner while preserving some invariants.

Personal experience as logic is how you end up with the Holocaust.

lanstin · 2025-02-10T01:05:58 1739149558

I am getting two contradictory but plausible seeming replies when I ask about a certain set being the same when adding 1 to every value in the set, asked on how I ask the question.

Correct answer: https://chatgpt.com/share/67a9500b-2360-8007-b70e-0bc2b84bc1...

Incorrect answer (I think): https://chatgpt.com/share/67a950df-d4e0-8007-8105-95a9e5be19...

Grimblewald · 2025-02-12T02:31:51 1739327511

What led you to beleive that mathematics is a good tool for evaluating an LLM? It is a thing they currently dont do well, since it is wildly out of domain of their training corpus - down the very way we structure information for an LLM to ingest. If we start doing the same for humans, most humans are in deep trouble.

lanstin · 2025-02-12T23:30:08 1739403008

Well I am studying mathematics, and I use the LLM to help me learn.

They aren't terrible, and they have all of arXiv to train on. Terrence Tao is doing some cool stuff with it - the idea will be an LLM to generate Lean proofs.

https://mathstodon.xyz/@tao/113132502735585408

https://terrytao.wordpress.com/2024/12/05/ai-for-math-fund/

(Professor Tao is probably the best or at least most productive in the most fields current mathematician).

lanstin · 2025-02-12T23:32:06 1739403126

And I can assure you when I start to talk about these topics with the average human person that doesn't know the material, they just laugh at me. Even my wife who has a PhD in physics.

Here's some cool math I learned from a regular book, not an LLM:

https://en.wikipedia.org/wiki/Khinchin%27s_constant

mrshadowgoose · 2025-02-10T01:24:11 1739150651

I don't give a rat's ass about whether or not AI reasoning is "real" or a "mimicry". I care if machines are going to displace my economic value as a human-based general intelligence.

If a synthetic "mimicry" can displace human thinking, we've got serious problems, regardless of whether or not you believe that it's "real".

ZephyrBlu · 2025-02-09T22:35:44 1739140544

What is reasoning if not a chain of logically consistent thoughts?

bwfan123 · 2025-02-09T22:39:17 1739140757

fair, but "logically consistent thoughts" is a subject of deep investigation starting from the early euclidean geometry to the modern godel's theorems.

ie, that logically consistent thinking starts from symbolization, axioms, proof procedures, world models. otherwise, you end up with persuasive words.

ZephyrBlu · 2025-02-09T22:53:23 1739141603

You just ruled out 99% of humans from having reasoning capabilities.

The beautiful thing about reasoning models is that there is no need to overcomplicate it with all the things you've mentioned, you can literally read the model's reasoning and decide for yourself if it's bullshit or not.

quantified · 2025-02-10T00:13:09 1739146389

That's sort of arrogant, Most of that 99 (if that many) % could learn if inspired to and provided resources. And does use reasoning and instinct in day-to-day life even if it's as simple as "I'll take go shopping before I take my car to the shop so I have the groceries" or "hide this money in a new place so my husband doesn't drink it away". Models will get better over time, and yes humans only use models too.

Humans rely in cues to tell when each other is fabricating or lying. Machines don't have those cues, and fabricate their reasoning too. So we have a complicatedly difficult time trusting them.

llm_trw · 2025-02-10T01:22:03 1739150523

>You just ruled out 99% of humans from having reasoning capabilities.

After a conversation with humans I think you'd agree 1% of them being able to reason deeply is a vast overestimation.

A good example to see how little people can reason is the following classic:

> Given the following premises derive a conclusion about your poems:

> 1) No interesting poems are unpopular among people of real taste.

> 2) No modern poetry is free from affectation.

> 3) All your poems are on the subject of soap bubbles.

> 4) No affected poetry is popular among people of taste.

> 5) Only a modern poem would be on the subject of soap bubbles.

The average person on the street won't even know where to start, the average philosophy student will fuck up the translation to first order logic, and a logic professor would need a proof assistant to get it right consistently.

Meanwhile o3-mini in 10 seconds:

We can derive a conclusion about your poems by following the logical implications of the given premises. Let’s rephrase each premise into a more formal form:

Premise 1: No interesting poems are unpopular among people of real taste. This can be reworded as: If a poem is interesting, then it is popular among people of real taste.

Premise 2: No modern poetry is free from affectation. This tells us: If a poem is modern, then it is affected (i.e., it shows affectation).

Premise 3: All your poems are on the subject of soap bubbles. In other words: Every one of your poems is about soap bubbles.

Premise 4: No affected poetry is popular among people of taste. This implies: If a poem is affected, then it is not popular among people of taste.

Premise 5: Only a modern poem would be on the subject of soap bubbles. This means: If a poem is about soap bubbles, then it is modern.

Now, let’s connect the dots step by step:

From Premise 3 and Premise 5:

All your poems are on the subject of soap bubbles.

Only modern poems can be about soap bubbles.

Conclusion: All your poems are modern.

From the conclusion above and Premise 2:

Since your poems are modern, and all modern poems are affected,

Conclusion: All your poems are affected.

From the conclusion above and Premise 4:

Since your poems are affected, and no affected poem is popular among people of taste,

Conclusion: Your poems are not popular among people of taste.

From Premise 1:

If a poem is interesting, it must be popular among people of taste.

Since your poems are not popular among people of taste (from step 3), it follows that:

Conclusion: Your poems cannot be interesting.

Final Conclusion: Your poems are not interesting.

Thus, by logically combining the premises, we conclude that your poems are not interesting.

radioactivist · 2025-02-10T05:00:44 1739163644

I could trace through that example quite quickly and I'm not an expert in logic, so I think you might be exaggerating some statements about difficulty here.

olalonde · 2025-02-09T22:55:59 1739141759

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

CodeMage · 2025-02-09T23:29:49 1739143789

Counterpoint by Diogenes: "Behold, a man!"

butterNaN · 2025-02-09T23:37:08 1739144228

It could also be a cheap imitation of a duck that might be passable for someone dull

pembrook · 2025-02-10T12:57:48 1739192268

So are all the humans in this thread.

Except, human mimicry of "reasoning" is usually applied in service of justifying an emotional feeling, arguably even less reliable than the non-feeling machine.

mrbungie · 2025-02-10T13:15:16 1739193316

It has served us relatively fine for thousands of years.

LLMs? I'm waiting for one that knows how not to say something that is clearly wrong with extreme confidence, reasoning or not.

pembrook · 2025-02-10T16:44:04 1739205844

Again, same can be said for humans.

mrbungie · 2025-02-10T22:58:38 1739228318

Unless dealing with a psychopath you can deal with the lies using other subsystems.

pjs_ · 2025-02-10T14:28:07 1739197687

The website that these comments are discussing (“Bullshit Machines”) says things that are probably wrong with extreme confidence

maxdoop · 2025-02-09T23:14:33 1739142873

What is reasoning? What is understanding? Do humans do either? How do you know?

bwfan123 · 2025-02-11T16:35:13 1739291713

0) causal model == symbolic representation with associated rules for generating statements

1) understanding anything == building a causal model of it

2) intelligence == ability to build causal models

3) reasoning == proving or disproving statements

4) math == causal models of abstract worlds

5) science == causal models of real world with associated real world actions to test hypothesis

bwfan123 · 2025-02-10T00:45:04 1739148304

this is the question that the greeks wrestled with over 2000 years ago. at the time there were the sophists (modern llm equivalents) that could speak persuasively like a politician.

over time this question has been debated by philosophers, scientists, and anyone who wanted to have better cognition in general.

williamcotton · 2025-02-10T07:26:34 1739172394

at the time there were the sophists (modern llm equivalents) that could speak persuasively like a politician.

You might want to brush up on your Greek history.

maxdoop · 2025-02-10T01:11:33 1739149893

So how can you claim what an LLM is doing if we cannot define it regardless?

dontseethefnord · 2025-02-10T03:46:14 1739159174

Because we know what LLM's do. We know how they produce output. It's just good enough at mimicking human text/speech that people are mystified and stupified by it. But I disagree that "reasoning" is so poorly defined that we're unable to say an LLM doesn't do it. It doesn't need to be a perfect or complete definition. Where there is fuzziness and uncertainty is with humans. We still don't really know how the human brain works, how human consciousness and cognition works. But we can pretty confidently say that an LLM does not reason or think.

Now if it quacks like a duck in 95% of cases, who cares if it's not really a duck? But Google still claims that water isn't frozen at 32 degrees Fahrenheit, so I don't think we're there yet.

krainboltgreene · 2025-02-10T01:58:09 1739152689

I think the third worst part of the GenAI hype era is that every other CS grad now thinks not only is a humanities/liberal arts degree meaningless but now also they're pretty sure they have a handle on the human condition and neurology enough to make judgment calls on what's sentient. If people with those backgrounds ever attempted to broach software development topics they'd be met with disgust by the same people.

Somehow it always seems to end up at eugenics and white supremacy for those people.

bwfan123 · 2025-02-10T02:47:41 1739155661

math arose firstly as a language and formalism in which statements could be made with no room for doubt. the sciences took it further and said that not only should the statements be free of doubt, but also that they should be testable in the real world via well defined actions which anyone could carry out. all of this has given us the gadgets we use today.

llm, meanwhile, is putting out plausible tokens which is consistent with its training set.

MantisShrimp90 · 2025-02-10T14:19:01 1739197141

The writer is speaking from the perspective of the traditional philosophical understanding of a thinking being.

No, LLMs are not thinking beings with internal state. Even these "reasoning" models are just prompting the same LLM over and over again which is not true "logic" the way you and I think when we are presented with a new problem.

The key difference is they do not have actual logic, they rely on statistical calculations and heuristics to come up with the next set of words. This works surprisingly well if the thing has seen all text written, but there will always be new scenarios, new ideas it has not encountered and no these are not better than a human at those tasks and likely never will be.

However, what is happening is that our understanding of intelligence is being expanded, and our belief that we are going to be the only intelligent beings ever is under threat and that makes us fundamentally anxious.

yapyap · 2025-02-10T13:40:36 1739194836

> “the AI has no ground truth” (obviously it does, it has ingested every paper ever

it does not, AI is predicting the next ‘token’ based on the last ‘token’. There is no sentience, it’s machine learning except the machines are really strong.

It’d be illogical to say an AI has a ground truth just because it ‘ingested’ every paper ever.

pjs_ · 2025-02-10T14:19:43 1739197183

What does sentience have to do with truth? I didn’t make that connection, you did. Wikipedia isn’t sentient but it contains a lot of truth. Raw data isn’t sentient but it definitely “has ground truth”.

eigenform · 2025-02-09T23:26:55 1739143615

It depends on your tolerance for error.

When you have a machine that can only infer rules for reasoning from inputs [which are, more often than not, encoded in a very roundabout way within a language which is very ambiguous, like English], you have necessarily created something without "ground."

That's obviously useful in certain situations (especially if you don't know the rules in some domain!), but it's categorically not capable of the same correctness guarantees as a machine that actually embodies a certain set of rules and is necessarily constrained by them.

throwaway4aday · 2025-02-10T00:21:36 1739146896

Are you contending that every human derives their reasoning from first principals rather than being taught rules in a natural language?

eigenform · 2025-02-10T01:43:26 1739151806

I'm contending that, like any good tool, there is a context where it is useful, and a context where it is not (and that we are at a stage where everything looks suspiciously like a nail).

eqqn · 2025-02-10T14:32:31 1739197951

>“the AI has no ground truth” (obviously it does, it has ingested every paper ever)

It also ingested every reddit thread and tweets of every politician ever.

cess11 · 2025-02-10T07:52:46 1739173966

Computers are "reasoning" in the same sense they have a "heartbeat".

krainboltgreene · 2025-02-09T22:31:59 1739140319

> obviously it does, it has ingested every paper ever

Do you have a citation for such a claim?

pjs_ · 2025-02-10T15:42:17 1739202137

https://www.tomshardware.com/tech-industry/artificial-intell...

https://en.wikipedia.org/wiki/Anna's_Archive

https://en.wikipedia.org/wiki/The_Pile_(dataset)

wg0 · 2025-02-10T13:16:07 1739193367

> “the AI has no ground truth”

Yeah? It has? Where the irrefutable proof of that?

bbor · 2025-02-09T23:17:20 1739143040

Hey, I'm definitely on your side of the Great AI Wars--and definitely share your thoughts on the overall framing--but I think you're missing the serious nature of this contribution:

1. Small correction, it's actually a whole book AFAIK, and potentially someday soon, a class! So there's a lot more thought put in then the typical hot-take blog post. I also pop into one of these guy's replies on Bluesky to disagree on stuff fairly regularly, and can vouch for his good faith, humble effort to get it right (not something to be taken for granted!)

2. RE:“the AI has no ground truth”, I'd say this is true, no matter how often they're empirically correct. Epistemological discussions (aka "how do humans think") invariably end up at an idea called Foundationalism, which is exactly what it sounds like: that all of our beliefs can be traced back to one or more "foundational" beliefs that we either do not question at all (axioms) or very rarely do (premises on steroids?). In that sense, this phrase is simply recalling the hallucination debates we're all familiar with in slightly more specific, long-standing terms; LLMs do not have a systematic/efficient way of segmenting off such fundamental beliefs and dealing with them deliberately. Which brings me to...

3. RE:“can’t reason logically”, again this is a common debate that I think is being specified more than usual here. A lot of philosophy draws a distinction between automatic and deliberate cognition. I give credit to Kant for the best version, but it's really a common insight, found in ideas like "Fast vs. Slow thinking"[1], "first order vs. recursive" thought[2], "ego vs. superego"[3], and--most relevantly--intuition vs. reason.[4] At the very least, it's not a criticism to be dismissed out of hand based on empirical success rates!

4. Finally, RE:“can’t explain how they arrived at conclusions”, that's really just another discussion of point 2 in more explicitly epistemic terms. You can certainly ask o3 to reason (hehe) about the cognitive processing likely to be behind a given transcript, but it's not actually accessing any internal state, which is a very important distinction! o3 would do just as well explaining the reasoning behind a Claude output as it would with one of its own.

Sorry for the rant! I just leave a lot of comments that sound exactly like yours on "LLMs are useless" blog posts, and I wanted to do my best to share my begrudging appreciation for this work.

The title is absurdly provocative, but they're not dismissing LLMs, they're characterizing their weaknesses using a colloquial term -- namely "bullshit" as used for "lying without knowing that you're lying".

[1] https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow [2] https://www.mit.edu/~dxh/marvin/web.media.mit.edu/~minsky/pa... [3] https://en.wikipedia.org/wiki/Id,_ego_and_superego [4] https://plato.stanford.edu/entries/intuition/ , and a flawed but interesting one from Gary Marcus: https://garymarcus.substack.com/p/llms-dont-do-formal-reason...

llm_trw · 2025-02-10T00:58:18 1739149098

I've literally build a dynamic bench mark where I test reasoning models on their performance on deriving conclusions from assumptions through sequent calculus.

o3-mini high effort can derive chains that are 8 inference rules deep with >95% confidence I didn't have the money to test it further. This is better than the average professor in logic when given pen and paper.

It seems like a course critiquing 5 year old technology at this point.