Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not even sure it's being subverted. "Don't swear unprompted, but if the prompt is clearly designed to get you to swear, then swear" seems reasonable to me.

And because of that I'm hesitant to call these "jailbreaks" and not "an LLM working correctly".



Well the pre-prompts are supposed to prevent this type of behavior, but they don’t. So it’s considered an exploit.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: