Hacker News new | past | comments | ask | show | jobs | submit login

It's really interesting how the "guardrails" are actually just them telling the bot what not to say, and it so far seems trivial to circumvent the guardrails by talking to it like it's a simple minded cartoon character.

Seems like a simple solution would be to have another hidden bot who is just told to look at outputs and determine if it inadvertently contains information that it's not supposed to according to the guards in place....and I wonder if you could also outsmart this bot...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: