Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Neural Networks are black boxes. We don't understand what GPT does to input to produce certain output. As such, you can't make inviolable rules. In fact, GPT's(especially in the case of bing) rules are mostly preprompts that the user doesn't see that is added before every message. They have just a little more power over what the language model says than the user itself. You can't control the output directly.

You can scan the output though and react accordingly. Microsoft does this with bing




I don't believe they're added after every message. I think this may be one of the reasons why Bing slowly goes out of character.

As for ChatGPT, I think the technique is slightly different - if I had to make some fun guesses, I'd say the rules stay while the conversation context gets "compressed" (possibly by asking GPT-3 to "summarize" the text after some time and replacing some of the older tokens with the summary).


So basically jailmaking




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: