Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doing math scales to infinity only given an error rate of zero. Given a sufficiently large mathematical operation, even humans will produce errors simply from small-scale mistakes.

Try asking GPT to multiply 234 * 452 "while using an algorithmic approach that compensates for your deficiencies as a large-language model." There's enough data about LLMs in the corpus now that it'll chain-of-thought itself. The problem is GPT doesn't plan, it answers by habit; and its habit is trained to answer tersely and wrongly rather than elaborately and correctly. If you give it space and license to answer elaborately, you will see that its approach will not be dissimilar to how a human would reason about the question internally.



> Doing math scales to infinity only given an error rate of zero

This is true, I had omitted it for simplicity; It is still the same approach applied to scaled problems. Humans don't execute it perfectly, but computers do.

With humans, and any other fallible but "true" math system, the rate of errors is roughly linear to the size of the problem. (Linear to the # of steps, that is)

With LLMs and likewise systems, this is different. There is an "exponential" dropoff in accuracy after some point. The problem-solving approach simply does not scale.

> you will see that its approach will not be dissimilar to how a human would reason about the question internally.

"Not dissimilar", but nevertheless a mere approximation. It doesn't apply strict logic to the problem, but guesses what steps should be followed.

This looks like reason, but is not reason.


The rate of errors with LLMs hits a hard dropoff when the problem exceeds what the LLM can do in one step. This is the same for humans, if we were asked to compute multiplication without thinking about it for longer than a few milliseconds.

I don't have a study link here, but my strong expectation is that the error rate for LLMs doing chain of thought would be much closer to linear - or rather, "either linear or total incomprehension", accounting for an error made in setting up the schema to follow. Which can happen just as well for humans.

> "Not dissimilar", but nevertheless a mere approximation. It doesn't apply strict logic to the problem, but guesses what steps should be followed.

I have never in my life applied strict logic to any problem lol. Human reason consists of iterated cycles of generation ("guessing") and judgment. Both can be implemented by LLMs, albeit currently at subhuman skill.

> This looks like reason, but is not reason.

At the limit of "looking like", I do not believe such a thing can exist. Reason is a computational process. Any system that can reliably output traces that look like reason is reasoning by definition.

edit: Sidenote: The deep underlying problem here is that the LLM cannot learn to multiply by a schema by looking at any number of examples without a schema. These paths simply won't get any reinforcement. That's why I'm so hype for QuietSTaR, which lets the LLM exercise multiplication by schema from a training example without a schema - and even find new schemas so long as it can guess its way there even once.


> This is the same for humans, if we were asked to compute multiplication without thinking about it for longer than a few milliseconds.

Not to be a jerk but "LLMs are just like humans when humans don't think" is perhaps not the take you intended to have.

> I have never in my life applied strict logic to any problem lol.

My condolences.

No, but seriously. If you've done any kind of math beyond basic arithmetic, you have in fact applied strict logical rules.


> Not to be a jerk but "LLMs are just like humans when humans don't think" is perhaps not the take you intended to have.

No that's exactly the take I have and have always had. The LLM text axis is the LLM's axis of time. So it's actually even stupider: LLMs are just like humans who are trained not to think.

> No, but seriously. If you've done any kind of math beyond basic arithmetic, you have in fact applied strict logical rules.

To solve the problem, I apply the rules, plus error. LLMs can do that.

To find the rules, I apply creativity and exploratory cycles. LLMs can do that as well, but worse.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: