DeepSeek-R0 (based on DeepSeek-V3 base model) was *only* trained with RL, no SFT...

DeepSeek-R0 (based on DeepSeek-V3 base model) was only trained with RL, no SFT, so this isn't at all like the "distillation" (i.e SFT on synthetic data generated by R1) that they also demonstrated by fine tuning Qwen and LLaMa.

Now, DeepSeek may (or may not) have used some O1 generated data for the R0 RL training, but if so that's just a cost saving vs having to source some reasoning data some other way, and in no way reduces the legitimacy of what they accomplished (which is not something any of the AI CEOs are saying).