Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> AI is not safe, and is not aligned to human interests

It is “aligned” to human utterances instead. We don’t want AIs to actually be human-like in that sense. Yet we train them with the entirety of human digital output.



The current state of the art is RLHF (reinforcement learning with human feedback); initially trained to complete human utterances, plus fine-tuning to maximize human feedback on whether the completion was "helpful" etc.

https://huggingface.co/blog/rlhf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: