That makes me wonder if we could simply test this by letting the LLM add or mult...

gcanyon · 2025-11-09T12:43:15 1762692195

> Do not use a calculator. Do it in your head.

You wouldn't ask a human to do that, why would you ask an LLM to? I guess it's a way to test them, but it feels like the world record for backwards running: interesting, maybe, but not a good way to measure, like, anything about the individual involved.

throwuxiytayq · 2025-11-09T13:49:27 1762696167

I’m starting to find it unreasonably funny how people always want language models to multiply numbers for some reason. Every god damn time. In every single HN thread. I think my sanity might be giving out.

solatic · 2025-11-09T14:27:57 1762698477

A model, no, but an agent with a calculator tool?

Then there's the question of why not just build the calculator tool into the model?

KristoAI · 2025-11-09T16:32:53 1762705973

Since grok 4 fast got this answer correct so quickly, I decided to test more.

Tested this on the new hidden model of ChatGPT called Polaris Alpha: Answer: $20,192,642.460942336$

Current gpt-5 medium reasoning says: After confirming my calculations, the final product (P) should be (20,192,642.460942336)

Claude Sonnet 4.5 says: “29,596,175.95 or roughly 29.6 million”

Claude haiku 4.5 says: ≈20,185,903

GLM 4.6 says: 20,171,523.725593136

I’m going to try out Grok 4 fast on some coding tasks at this point to see if it can create functions properly. Design help is still best on GPT-5 at this exact moment.

jarek83 · 2025-11-09T11:36:58 1762688218

Isn't that LLMs are not designed to do calculations?

cluckindan · 2025-11-09T11:55:12 1762689312

They are not LMMs, after all…

mg · 2025-11-09T11:54:25 1762689265

Neither are humans.

cuu508 · 2025-11-09T12:46:23 1762692383

But humans can still do it.