RL is still training. Just like pretraining is still training. SFT is also train... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		itchyjunk 7 months ago \| parent \| context \| favorite \| on: Intellect-2 Release: The First 32B Model Trained T... RL is still training. Just like pretraining is still training. SFT is also training. This is how I look at it. Models weights are being updated in all cases.

refulgentis 7 months ago [–]

Simplifying it down to "adjusting any weights is training, ipso facto this is meaningful" obscures more light than it sheds (as they noted, RL doesn't get you very far, at all)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact