The interesting thing here is that OpenAI is claiming ~90th percentile scores on a number of standardized tests (which, obviously, are typically administered to humans, and have the disadvantage of being mostly or partially multiple choice). Still...
> GPT-4 performed at the 90th percentile on a simulated bar exam, the 93rd percentile on an SAT reading exam, and the 89th percentile on the SAT Math exam, OpenAI claimed.
So, clearly, it can do math problems, but maybe it can only do "standard" math and logic problems? That might indicate more of a memorization-based approach than a reasoning approach is what's happening here.
The followup question might be: what if we pair GPT-4 with an actual reasoning engine? What do we get then?
> GPT-4 performed at the 90th percentile on a simulated bar exam, the 93rd percentile on an SAT reading exam, and the 89th percentile on the SAT Math exam, OpenAI claimed.
https://www.cnbc.com/2023/03/14/openai-announces-gpt-4-says-...
So, clearly, it can do math problems, but maybe it can only do "standard" math and logic problems? That might indicate more of a memorization-based approach than a reasoning approach is what's happening here.
The followup question might be: what if we pair GPT-4 with an actual reasoning engine? What do we get then?