So you mean something like, "what if the baseline, off-the-cuff response for the... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		entropicdrifter on Jan 30, 2025 \| parent \| context \| favorite \| on: An analysis of DeepSeek's R1-Zero and R1 So you mean something like, "what if the baseline, off-the-cuff response for the next-gen models was tuned based on the results of the reasoning model excluding the reasoning itself?"

spyckie2 on Jan 30, 2025 [–]

Exactly, albeit it may need the reasoning later to form the proper foundational logic in the weights.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact