Everything related to LLMs is probabilistic, but those rules are also often foll...

manmal · 2025-10-28T05:31:44 1761629504

Yes they do, most of the time. Then they don’t. Yesterday, I told codex that it must always run tests by invoking a make target. That target is even configurable w/ parameters, eg to filter by test name. But always, at some point in the session, codex started disregarding that rule and fell back to using the platform native test tool directly. I used strong language to steer it back, but 20% or so of context later, it did that again.

Dilettante_ · 2025-10-28T09:56:30 1761645390

Once the LLM has made one mistake, it's often best to start a new context.

Since its mechanism is to predict the next token of the conversation, it's reasonable to "predict" itself making more mistakes once it has made one.

manmal · 2025-10-28T10:24:20 1761647060

I‘m not sure this is still the case with codex. In this instance, restarting had no strong effect.