Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Everything related to LLMs is probabilistic, but those rules are also often followed well by agents.




Yes they do, most of the time. Then they don’t. Yesterday, I told codex that it must always run tests by invoking a make target. That target is even configurable w/ parameters, eg to filter by test name. But always, at some point in the session, codex started disregarding that rule and fell back to using the platform native test tool directly. I used strong language to steer it back, but 20% or so of context later, it did that again.

Once the LLM has made one mistake, it's often best to start a new context.

Since its mechanism is to predict the next token of the conversation, it's reasonable to "predict" itself making more mistakes once it has made one.


I‘m not sure this is still the case with codex. In this instance, restarting had no strong effect.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: