You need an extremely large amount of training data of good CoTs. And there prob...

You need an extremely large amount of training data of good CoTs. And there probably is some magic; we know LLMs aren't capable of self reflection and none of the other ones are any good at iterating to a better answer.

Example prompt for that: "give me three sentences that end in 'is'."