No this is pretty much solved at this point. You simply have a secondary model/agent act as an arbitrator for every user input. The user input gets preprocessed into a standardized, formatted text representation (not a raw user message), and the arbitrator flags attempts at jailbreaking, prior to the primary agent/workflow being able to act on the user input.