The chains-of-thought here are artificially constructed, very information-dense partial sums formatted in a specific way that guides the fine tuning. A potential next step would be to look at real-world chains-of-thought and see whether some process could start with those and achieve the same result. Then you could really have a self-improving system!
Also I wonder if the LLM "knows" that it has this capability after fine-tuning. If it encounters multiplication as part of some larger chain-of-thought, will it solve that internally, or will it continue to do it step-by-step in the chain-of-thought?
But it's very hard to define "real-world CoT" -- think about human, we learn multiplications by vertical calculation and we learn division in a similar way -- all these learning process requires an "information dense" tools (calculation process) with intrinsic math rules in it. Isn't that an adapted way of CoT?
Oh, by "real world" I meant "chains of thought generated by existing reasoning LLMs" (as opposed to injecting predefined CoT like was done in the experiment), not human thoughts.
Also I wonder if the LLM "knows" that it has this capability after fine-tuning. If it encounters multiplication as part of some larger chain-of-thought, will it solve that internally, or will it continue to do it step-by-step in the chain-of-thought?