> So once they go down a path they can’t properly backtrack. That's what the spe...

		NitpickLawyer 8 months ago \| parent \| context \| favorite \| on: Procedural knowledge in pretraining drives reasoni... > So once they go down a path they can’t properly backtrack. That's what the specific training in o1 / r1 / qwq are addressing. The model outputs things like "i need to ... > thought 1 > ... > wait that's wrong > i need to go back > thought 2 > ... etc