Hacker News new | past | comments | ask | show | jobs | submit login

I would not expect any LLM to get this right. I think people have too high expectations for it.

Now if you asked it to write a Python program to list them in order, and have it enter all the names, birthdays, and year elected in a list to get the program to run - that's more reasonable.




The “o” models get the order right.

DeepSeek also gets the order right.

It doesn’t show on the share link. But it actually outputs the list correctly from the built in Python interpreter.

For some things, ChatGPT 4o will automatically use its Python runtime


That some models get it right is irrelevant. In general, if your instructions require computation, it's safer to assume it won't get it right and will hallucinate.


The reasoning models all do pretty good at math.

Have you tried them?

This is something I threw together with o3-mini

https://chatgpt.com/share/679d5305-5f04-8010-b5c4-61c31e79b2...

ChatGPT 4o doesn’t even try to do the math internally and uses its built in Python interpreter. (The [_>] link is to the Python code)

https://chatgpt.com/share/679d54fe-0104-8010-8f1e-9796a08cf9...

DeepSeek handles the same problem just as well using the reasoning technique.

Of course ChatGPT 4o went completely off the rails without using its Python interpreter

https://chatgpt.com/share/679d5692-96a0-8010-8624-b1eb091270...

(The break down that it got right was using Python even though I told it not to)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: