Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an uninformed take. Much of the improvement in performance of LLM based models has been through RLHF and other RL techniques.


> This is an uninformed take.

You may disagree with this take but its not uninformed. Many LLMs use self‑supervised pretraining followed by RL‑based fine‑tuning but that's essentially it - it's fine tuning.


I think you're seriously underestimating the importance of the RL steps on LLM performance.

Also how do you think the most successful RL models have worked? AlphaGo/AlphaZero both use Neural Networks for their policy and value networks which are the central mechanism of those models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: