Hacker Newsnew | past | comments | ask | show | jobs | submit | Johngibb's commentslogin

I am actually asking this question in good faith: are we certain that there's no way to write a useful AI agent that's perfectly defended against injection just like SQL injection is a solved problem?

Is there potentially a way to implement out-of-band signaling in the LLM world, just as we have in telephones (i.e. to prevent phreaking) and SQL (i.e. to prevent SQL injection)? Is there any active research in this area?

We've built ways to demarcate memory as executable or not to effectively transform something in-band (RAM storing instructions and data) to out of band. Could we not do the same with LLMs?

We've got a start by separating the system prompt and the user prompt. Is there another step further we could go that would treat the "unsafe" data differently than the safe data, in a very similar way that we do with SQL queries?

If this isn't an active area of research, I'd bet there's a lot of money to be made waiting to see who gets into it first and starts making successful demos…


This is still an unsolved problem. I've been tracking it very closely for almost three years - https://simonwillison.net/tags/prompt-injection/ - and the moment a solution shows up I will shout about it from the rooftops.


It is a very active area of research, AI alignment. The research so far [1] suggests inherent hard limits to what can be achieved. TeMPOraL's comment [2] above points out the reason this is so: the generalizable nature of LLMs is in direct tension with certain security requirements.

[1] check out Robert Miles' excellent AI safety channel on youtube: https://www.youtube.com/@RobertMilesAI

[2] https://news.ycombinator.com/item?id=44504527


This reads like an infomercial... like someone was paid to write it.


I'm pretty sure the guy's an evangelist for Microsoft. The article seems so over the top that it's infomercial territory.


I agree. I think surface is pretty cool, but this is a little too positive.


If we're going to grant that a machine can have infinite ram, surely we can grant that a spec of C can have infinite stack allocated arrays... :)


Can I have one please?


Sent


Please, be constructive instead of just dismissive. A nasty tone doesn't help anyone nor does it encourage trying new things.


I tried to be constructive, but I honestly couldn't find a single good thing about this. The execution was poor, and the entire idea of making a 'framework' out of boxes with background colors is ridiculous.

I do apologize for the harshness, especially if it offended people, and I'm all for trying new things, but something like this should absolutely not be up on the popular page of hacker news. This is an amateur attempt at a framework that was very poorly done. Is that honestly deserving of more than 100 upvotes?


I think your criticism is well founded.

That said, the author did put work into something, document it, and share it freely with the community - and for that reason, I'd rather not see someone hurt their feelings calling their work an 'abomination.'

I just think with a slightly different tone you could have made the same point in a way that would show the OP some brutal honesty but without discouraging them.


good reply and good point. thank you


I bet the variation in the size of people's fingers is greater than the difference in the size of touch targets between original and mini iPads... :)


Deprecated, but not removed yet. It "may be removed in a relatively distant future"


This is what deprecated means. You can't just remove an API that's existed for years and scripts may depend on without giving them time to adapt. This shows a warning when it's used so it's clear which scripts should be changed.


That's true, and I think assigning malice to this situation is unfounded. But maybe this is a problem worth solving?


Come on, don't reject it so flippantly. Sure, for the case of a woman getting married and taking a new last name, I doubt it's that big of a deal - you can change your name for future commits, but your old name will exist for historical commits. However, there _are_ cases where you might not want your former name around (transgender, or even something like privacy / witness protection). Right now, these folk are being sort of excluded (however inadvertently) and it's worth discussing ways to fix that.


That stood out to me too... is there an inside joke? What's the white have to do with it?


Seems like a throwaway joke to me; nothing to over-analyze.


Well, Most Asians have awesome APM or are great at table tennis. Black people tend to be great at running really fast. It makes sense doesn't it?:)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: