The entire attitude to how they are managing use of ChatGPT in general does feel a lot like “we’re just going to treat the public as the red team.
I honestly expected to see way more “I just tried a jailbreak prompt for $stupid_reason and got banned…” stories.
I expected more automated front end countermeasures… to jailbreak prompt engineering efforts… but instead they appear to have nothing, not even some heuristics kicking in on sensitive wordsz
That’s my point. The jailbreaks are if not explicitly against the terms of service, they are certainly implicitly against the terms of service and within OpenAI’s discretion to ban people for. Yet they appear to be letting people get away with a lot. I mean heck just the other day based on a HN comment on another topic, I whipped up an Jailbreak prompt to see how hard it is to get ChatGPT to flirt or be sexual, and how well it does at writing that kind of thing. Now sure it was just a one off experiment and while they were dealing with the history incident, but it’s not like it’s hard to have something flag the three letters “sex” and have some kind of telemetry, because they explicitly mention sexual content in the TOS… they aren’t targeting the low hanging fruit because it’s pointless. They’re letting the red team (the public) try and really test the boundaries to make sure that much more serious issues aren’t hiding, waiting for someone to find much later if they were more aggressively limiting the amount of implicit TOS violations that appear to be happening.
I honestly expected to see way more “I just tried a jailbreak prompt for $stupid_reason and got banned…” stories.
I expected more automated front end countermeasures… to jailbreak prompt engineering efforts… but instead they appear to have nothing, not even some heuristics kicking in on sensitive wordsz