> AI is not safe, and is not aligned to human interests It is “aligned” to human... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		layer8 on Feb 15, 2023 \| parent \| context \| favorite \| on: Bing: “I will not harm you unless you harm me firs... > AI is not safe, and is not aligned to human interests It is “aligned” to human utterances instead. We don’t want AIs to actually be human-like in that sense. Yet we train them with the entirety of human digital output.

theptip on Feb 15, 2023 [–]

The current state of the art is RLHF (reinforcement learning with human feedback); initially trained to complete human utterances, plus fine-tuning to maximize human feedback on whether the completion was "helpful" etc.

https://huggingface.co/blog/rlhf

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact