The *RL*, not the training. No? | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

refulgentis 32 days ago | parent | context | favorite | on: Intellect-2 Release: The First 32B Model Trained T...

The RL, not the training. No?

itchyjunk 32 days ago [–]

RL is still training. Just like pretraining is still training. SFT is also training. This is how I look at it. Models weights are being updated in all cases.

refulgentis 31 days ago | [–]

Simplifying it down to "adjusting any weights is training, ipso facto this is meaningful" obscures more light than it sheds (as they noted, RL doesn't get you very far, at all)

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact