Hacker News new | past | comments | ask | show | jobs | submit login

The RL, not the training. No?



RL is still training. Just like pretraining is still training. SFT is also training. This is how I look at it. Models weights are being updated in all cases.


Simplifying it down to "adjusting any weights is training, ipso facto this is meaningful" obscures more light than it sheds (as they noted, RL doesn't get you very far, at all)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: