Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe a dumb question but: what is a "reasoning model"?

I think I get that "reasoning" in this context refers to dynamically budgeting scratchpad tokens that aren't intended as the main response body. But can't any model do that, and it's just part of the system prompt, or more generally, the conversation scaffold that is being written to.

Or does a "reasoning model" specifically refer to models whose "post training" / "fine tuning" / "rlhf" laps have been run against those sorts of prompts rather than simpler user-assistant-user-assistant back and forths?

EG, a base model becomes "a reasoning model" after so much experience in the reasoning mines.





The latter. A reasoning model has been finetuned to use the scratchpad for intermediate results (which works better than just prompting a model to do the same).

I'd expect the same (fine tuning to be better than mere prompting) for most anything.

So a model is or is not "a reasoning model" according to the extent of a fine tune.

Are there specific benchmarks that compare models vs themselves with and without scratchpads? High with:without ratios being reasonier models?

Curious also how much a generalist model's one-shot responses degrade with reasoning post-training.


> Are there specific benchmarks that compare models vs themselves with and without scratchpads?

Yep, it's pretty common for many models to release an instruction-tuned and thinking-tuned model and then bench them against each other. For instance, if you scroll down to "Pure text performance" there's a comparison of these two Qwen models' performance: https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking


Thanks for the Qwen tip. Interesting how much of a difference reasoning makes for coding.

> Are there specific benchmarks that compare models vs themselves with and without scratchpads? High with:without ratios being reasonier models?

Yes, simplest example: https://www.anthropic.com/engineering/claude-think-tool


The question is: fine-tuning for what? Reasoning is not a particular task, it is a general-purpose technique for directing more compute at any task.

Pivot tokens like 'wait', 'actually' and 'alternatively' are boosted in order to force the model to explore alternate solutions.

Any model that does thinking inside <think></think> style tokens before it answers.

This can be done with finetuning/RL using an existing pre-formatted dataset, or format based RL where the model is rewarded for both answering correct and using the right format.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: