I am actually asking this question in good faith: are we certain that there's no way to write a useful AI agent that's perfectly defended against injection just like SQL injection is a solved problem?
Is there potentially a way to implement out-of-band signaling in the LLM world, just as we have in telephones (i.e. to prevent phreaking) and SQL (i.e. to prevent SQL injection)? Is there any active research in this area?
We've built ways to demarcate memory as executable or not to effectively transform something in-band (RAM storing instructions and data) to out of band. Could we not do the same with LLMs?
We've got a start by separating the system prompt and the user prompt. Is there another step further we could go that would treat the "unsafe" data differently than the safe data, in a very similar way that we do with SQL queries?
If this isn't an active area of research, I'd bet there's a lot of money to be made waiting to see who gets into it first and starts making successful demos…
This is still an unsolved problem. I've been tracking it very closely for almost three years - https://simonwillison.net/tags/prompt-injection/ - and the moment a solution shows up I will shout about it from the rooftops.
It is a very active area of research, AI alignment. The research so far [1] suggests inherent hard limits to what can be achieved. TeMPOraL's comment [2] above points out the reason this is so: the generalizable nature of LLMs is in direct tension with certain security requirements.
I tried to be constructive, but I honestly couldn't find a single good thing about this. The execution was poor, and the entire idea of making a 'framework' out of boxes with background colors is ridiculous.
I do apologize for the harshness, especially if it offended people, and I'm all for trying new things, but something like this should absolutely not be up on the popular page of hacker news. This is an amateur attempt at a framework that was very poorly done. Is that honestly deserving of more than 100 upvotes?
That said, the author did put work into something, document it, and share it freely with the community - and for that reason, I'd rather not see someone hurt their feelings calling their work an 'abomination.'
I just think with a slightly different tone you could have made the same point in a way that would show the OP some brutal honesty but without discouraging them.
This is what deprecated means. You can't just remove an API that's existed for years and scripts may depend on without giving them time to adapt. This shows a warning when it's used so it's clear which scripts should be changed.
Come on, don't reject it so flippantly. Sure, for the case of a woman getting married and taking a new last name, I doubt it's that big of a deal - you can change your name for future commits, but your old name will exist for historical commits. However, there _are_ cases where you might not want your former name around (transgender, or even something like privacy / witness protection). Right now, these folk are being sort of excluded (however inadvertently) and it's worth discussing ways to fix that.
Is there potentially a way to implement out-of-band signaling in the LLM world, just as we have in telephones (i.e. to prevent phreaking) and SQL (i.e. to prevent SQL injection)? Is there any active research in this area?
We've built ways to demarcate memory as executable or not to effectively transform something in-band (RAM storing instructions and data) to out of band. Could we not do the same with LLMs?
We've got a start by separating the system prompt and the user prompt. Is there another step further we could go that would treat the "unsafe" data differently than the safe data, in a very similar way that we do with SQL queries?
If this isn't an active area of research, I'd bet there's a lot of money to be made waiting to see who gets into it first and starts making successful demos…