Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tiniest nudge pushes a complex system (ChatGPT’s LLM) from a delicate hard won state - alignment - to something very undesirable.

The space of possible end states for trained models must be a minefield. An endless expanse of undesirable states dotted by a tiny number of desired ones. If so, the state these researchers found is one of a great many.

Proves how hard it was to achieve alignment in the first place.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: