Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> So once they go down a path they can’t properly backtrack.

That's what the specific training in o1 / r1 / qwq are addressing. The model outputs things like "i need to ... > thought 1 > ... > wait that's wrong > i need to go back > thought 2 > ... etc



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: