One might even wonder if the fact that the training data includes safety evaluat... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Bjartr 6 months ago \| parent \| context \| favorite \| on: Large language models often know when they are bei... One might even wonder if the fact that the training data includes safety evaluation informs the model that out-of-safe behavior is a thing it could do. Kind of like telling a kid not to do something pre-emptively backfiring because they had never considered it before the warning.

Jensson 6 months ago [–]

Comments like yours makes the AI behave that way though, since it is literally reading our comments and tries to behave according to our expectations.

The AI doom will happen due to all the AI doomposters.

Bjartr 6 months ago | [–]

Yep! That's another phrasing of the same idea!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact