You need an extremely large amount of training data of good CoTs. And there probably is some magic; we know LLMs aren't capable of self reflection and none of the other ones are any good at iterating to a better answer.
Example prompt for that: "give me three sentences that end in 'is'."
If so, I imagine o1 clones could just be fine tunes of llamas initially.