Maybe a dumb question but: what is a "reasoning model"?
I think I get that "reasoning" in this context refers to dynamically budgeting scratchpad tokens that aren't intended as the main response body. But can't any model do that, and it's just part of the system prompt, or more generally, the conversation scaffold that is being written to.
Or does a "reasoning model" specifically refer to models whose "post training" / "fine tuning" / "rlhf" laps have been run against those sorts of prompts rather than simpler user-assistant-user-assistant back and forths?
EG, a base model becomes "a reasoning model" after so much experience in the reasoning mines.
The latter. A reasoning model has been finetuned to use the scratchpad for intermediate results (which works better than just prompting a model to do the same).
> Are there specific benchmarks that compare models vs themselves with and without scratchpads?
Yep, it's pretty common for many models to release an instruction-tuned and thinking-tuned model and then bench them against each other. For instance, if you scroll down to "Pure text performance" there's a comparison of these two Qwen models' performance: https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking
Any model that does thinking inside <think></think> style tokens before it answers.
This can be done with finetuning/RL using an existing pre-formatted dataset, or format based RL where the model is rewarded for both answering correct and using the right format.
I think I get that "reasoning" in this context refers to dynamically budgeting scratchpad tokens that aren't intended as the main response body. But can't any model do that, and it's just part of the system prompt, or more generally, the conversation scaffold that is being written to.
Or does a "reasoning model" specifically refer to models whose "post training" / "fine tuning" / "rlhf" laps have been run against those sorts of prompts rather than simpler user-assistant-user-assistant back and forths?
EG, a base model becomes "a reasoning model" after so much experience in the reasoning mines.