Understanding R1-Zero-Like Training: A Critical Perspective [pdf] | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Understanding R1-Zero-Like Training: A Critical Perspective [pdf] (github.com/sail-sg)
		1 point by delifue 4 months ago \| hide \| past \| favorite \| 1 comment

delifue 4 months ago [–]

From: https://x.com/zzlccc/status/1903162768083259703

DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning

The ever-increasing output length in RL-tuning might be due to a BIAS in GRPO

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact