Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

See Sleeper Agents (https://arxiv.org/abs/2401.05566).


Who in their right mind is going to blindly take the code output by a large language model and toss it on a cruise missile? Sleeper agents are trivially circumvented by even a modicum of human oversight.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: