Reinforcement learning w/ human feedback. What u guys are describing is the alig... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

meow_mix on March 14, 2023 | parent | context | favorite | on: GPT-4

Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem

mistymountains on March 14, 2023 [–]

That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact