No this is pretty much solved at this point. You simply have a secondary model/a...

simonw · 2025-06-09T03:01:26 1749438086

That doesn't work either. It's always possible to come up with an attack which subverts the "moderator" model first.

Using non-deterministic AI to protect against attacks against non-deterministic AI is a bad approach.

K0balt · 2025-06-09T10:07:08 1749463628

So you just need another agent to review the data being passed to the protector agent. Easy-peasy.

Use my openAI referral code #LETITRAIN for 10% off!