Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The entire attitude to how they are managing use of ChatGPT in general does feel a lot like “we’re just going to treat the public as the red team.

I honestly expected to see way more “I just tried a jailbreak prompt for $stupid_reason and got banned…” stories.

I expected more automated front end countermeasures… to jailbreak prompt engineering efforts… but instead they appear to have nothing, not even some heuristics kicking in on sensitive wordsz



Why would they ban you for trying jailbreak prompts? Wouldn't that defeat the whole purpose?


It does break the ToS, and for puritan rules like OpenAI's it's common to expect them to be offended by anyone working around the tooling.


I’m aghast that using a general purpose AI for general purposes is treated as a malicious act and censured as such. “No! Not those tokens!!”


That’s my point. The jailbreaks are if not explicitly against the terms of service, they are certainly implicitly against the terms of service and within OpenAI’s discretion to ban people for. Yet they appear to be letting people get away with a lot. I mean heck just the other day based on a HN comment on another topic, I whipped up an Jailbreak prompt to see how hard it is to get ChatGPT to flirt or be sexual, and how well it does at writing that kind of thing. Now sure it was just a one off experiment and while they were dealing with the history incident, but it’s not like it’s hard to have something flag the three letters “sex” and have some kind of telemetry, because they explicitly mention sexual content in the TOS… they aren’t targeting the low hanging fruit because it’s pointless. They’re letting the red team (the public) try and really test the boundaries to make sure that much more serious issues aren’t hiding, waiting for someone to find much later if they were more aggressively limiting the amount of implicit TOS violations that appear to be happening.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: