You wouldn't ask a human to do that, why would you ask an LLM to? I guess it's a way to test them, but it feels like the world record for backwards running: interesting, maybe, but not a good way to measure, like, anything about the individual involved.
I’m starting to find it unreasonably funny how people always want language models to multiply numbers for some reason. Every god damn time. In every single HN thread. I think my sanity might be giving out.
Since grok 4 fast got this answer correct so quickly, I decided to test more.
Tested this on the new hidden model of ChatGPT called Polaris Alpha: Answer: $20,192,642.460942336$
Current gpt-5 medium reasoning says: After confirming my calculations, the final product (P) should be (20,192,642.460942336)
Claude Sonnet 4.5 says: “29,596,175.95
or roughly 29.6 million”
Claude haiku 4.5 says: ≈20,185,903
GLM 4.6 says: 20,171,523.725593136
I’m going to try out Grok 4 fast on some coding tasks at this point to see if it can create functions properly. Design help is still best on GPT-5 at this exact moment.
Here is an experiment:
https://www.gnod.com/search/#q=%23%20Calcuate%20the%20below%...
The correct answer:
Here is what I got from different models on the first try: