Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem



That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: