Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We have different definitions of pretty well. It can’t do basic math consistently.


If you prompt it correctly it seems to be able to solve most basic math problems you throw at it. Here's a PNAS claiming human-level performance: https://www.pnas.org/doi/full/10.1073/pnas.2123433119

There's a deluge of papers and research right now, so there's quite a bit of complexity to saying "pretty well".

However, let me say it this way: Compared to other LLMs, recent OpenAI models score highly on logic and math exercises. Yes, there ARE better LLM's trained to do math computations (especially fine-tuned for certain problems), but I'd say ChatGPT is certainly impressive as a general text and code model.

The other side of the medallion of saying "pretty well" is this: There is no other type of computational approach that is able to solve free-form logic or math queries in any capacity. There is no symbolic approach that can "extract" a math problem from text and then solve it - in code or otherwise, whereas LLMs are getting close to human performance on such (and related tasks).

So yeah. Pretty well is what I'd claim.


As I understand that paper, I would take issue that the LLM isn’t “doing” math at all. Look at Figure 4 for the process to be most clear. It’s generating a text program that when run via Python can solve the problem. All the LLM is doing is matching the equation in the question to whatever operators it needs in Python syntax. I wouldn’t call that doing or understanding math.


So yes, you’ll probably want to take a model trained on math problems to forgoe the step with code.

But then, note that writing correct programs is indeed a high level solution to a full text math problem, is it not? Going from there to solving it directly should be a matter of some tuning.

Finally, who said understanding math?

The whole debate is about getting shockingly useful results precisely without symbolic reasoning.

If the main contention is that got does not do symbolic reasoning, then we are back at Gary Marcus… yes we know this. It’s not why researchers are so amazed by these models. Its that they output steps (or in this case code since its Codex based) solving university level math with a simple transformer architecture


It can't do math because it is operating in a single neural network path. You, also, cannot do math in a single neural network path. Even when you add two small numbers like 123+456 your brain is mentally iterating over the digits, detecting if a carry is needed, doing that, etc. That is, you have a looping/recursive process running inside your brain. You only output the final answer.

GPT does not have such a looping/recursive process inside its neural net. It's a fixed depth non-recursive neural net.

You can get it emulate recursive processes by prompting it with tricks like "think step by step". If you describe the addition algorithm you learn in elementary school (e.g. the digit-by-digit, carry if the sum exceeds 9, etc) in sufficient detail, it can execute that algorithm.


Gpt3 is a graph neural network with added recursive information through positioninal encoding. It outputs sequentially, but I am not sure why a RNN would be required beyond that.

I would agree that the manner of reasoning must differ, since the sequence follows bpe tokens rather than logical steps, however who is to say that another form of mathematical reasoning could not lead to valid results?

For instance, Gpt might solve the problem at each output step insofar as required to generate that token.

It certainly iterates over each output token, and the encoding of the problem is equivalent so iterating over the characters of the math problem, roughly speaking. But yes, the iterated output does not follow a logical graph externally, it is token by token. But internally, the network can absolutely follow a more complex graph.

Could you say what you mean by single network path when we speak about attention based architectures?

What sort of operation or information is missing in such architecture?

I am aware of of some results relating to certain graphs, but I do not think this would apply to a text describing a math problem, say.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: