I would not expect any LLM to get this right. I think people have too high expec...

scarface_74 · 2025-01-31T21:00:52 1738357252

The “o” models get the order right.

DeepSeek also gets the order right.

It doesn’t show on the share link. But it actually outputs the list correctly from the built in Python interpreter.

For some things, ChatGPT 4o will automatically use its Python runtime

BeetleB · 2025-01-31T22:11:47 1738361507

That some models get it right is irrelevant. In general, if your instructions require computation, it's safer to assume it won't get it right and will hallucinate.

scarface_74 · 2025-01-31T23:04:01 1738364641

The reasoning models all do pretty good at math.

Have you tried them?

This is something I threw together with o3-mini

https://chatgpt.com/share/679d5305-5f04-8010-b5c4-61c31e79b2...

ChatGPT 4o doesn’t even try to do the math internally and uses its built in Python interpreter. (The [_>] link is to the Python code)

https://chatgpt.com/share/679d54fe-0104-8010-8f1e-9796a08cf9...

DeepSeek handles the same problem just as well using the reasoning technique.

Of course ChatGPT 4o went completely off the rails without using its Python interpreter

https://chatgpt.com/share/679d5692-96a0-8010-8624-b1eb091270...

(The break down that it got right was using Python even though I told it not to)