Hacker News new | past | comments | ask | show | jobs | submit login

Does o1 need some method to allow it to generate lengthy chains of thought, or does it just do it normally after being trained to do so?

If so, I imagine o1 clones could just be fine tunes of llamas initially.




You need an extremely large amount of training data of good CoTs. And there probably is some magic; we know LLMs aren't capable of self reflection and none of the other ones are any good at iterating to a better answer.

Example prompt for that: "give me three sentences that end in 'is'."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: