Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The chains-of-thought here are artificially constructed, very information-dense partial sums formatted in a specific way that guides the fine tuning. A potential next step would be to look at real-world chains-of-thought and see whether some process could start with those and achieve the same result. Then you could really have a self-improving system!

Also I wonder if the LLM "knows" that it has this capability after fine-tuning. If it encounters multiplication as part of some larger chain-of-thought, will it solve that internally, or will it continue to do it step-by-step in the chain-of-thought?



But it's very hard to define "real-world CoT" -- think about human, we learn multiplications by vertical calculation and we learn division in a similar way -- all these learning process requires an "information dense" tools (calculation process) with intrinsic math rules in it. Isn't that an adapted way of CoT?


Oh, by "real world" I meant "chains of thought generated by existing reasoning LLMs" (as opposed to injecting predefined CoT like was done in the experiment), not human thoughts.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: