Hacker Newsnew | past | comments | ask | show | jobs | submit | zsolt_terek's commentslogin

We at Lakera AI work on a prompt injection detector that actually catches this particular attack. The models are trained on various data sources, including prompts from the Gandalf prompt injection game.


I have beef with Lakera AI specifically -- Lakera AI has never produced a public demo that has a 100% defense rate against prompt injection. Lakera has launched a "game" that it uses for harvesting data to train its own models, but that game has never been effective at preventing 100% of attacks and does not span the full gamut of every possible attack.

If Lakera AI had a defense for this, the company would be able to prove it. If you had a working 100% effective method for blocking injections, there would be an impossible level in the game. But you don't have one, so the game doesn't have a level like that.

Lakera AI is engaging in probabilistic defense, but in the company's marketing it attempts to make it sound like there's something more reliable going on. No one has ever demonstrated a detector that is fully reliable, and no one has a surefire method for defending against all prompt injections, and very genuinely I consider it to be deceptive that Lakera AI regularly leaves that fact out of its marketing.

The post above is wrong -- there is no 100% reliable way to catch this particular attack with an injection detector. What you should say is that at Lakera AI you have an injection detector that catches this attack some of the time. But that's not how Lakera phrases its marketing. The company is trying to discretely sell people on the idea of a product that does not exist and has not been demonstrated by researchers to be even possible to build.


Sorry, where is Lakera claiming to have 100% success rate to an ever changing attack?

Of course that’s a known fact among technical people expert in that matter that an impassable defense against any kind of attack of this nature is impossible.


> Sorry, where is Lakera claiming to have 100% success rate to an ever changing attack?

In any other context other than prompt injection, nearly everyone would interpret the following sentence as meaning Lakera's product will always catch this attack:

> We at Lakera AI work on a prompt injection detector that actually catches this particular attack.

If we were talking about SQL injections, and someone posted that prepared statements catch SQL injections, we would not expect them to be referring to a probabilistic solution. You could argue that the context is the giveaway, but honestly I disagree. I think this statement is very far off the mark:

> Of course that’s a known fact among technical people expert in that matter that an impassable defense against any kind of attack of this nature is impossible.

I don't think I've ever seen a thread on HN about prompt injection that hasn't had people arguing that it's either easy to solve or can be solved through chained outputs/inputs, or that it's not a serious vulnerability. There are people building things with LLMs today who don't know anything about this. There are people launching companies off of LLMs who don't know anything about prompt injection. The experts know, but very few of the people in this space are experts. Ask Simon how many product founders he's had to talk to on Twitter after they've written breathless threads where they discover for the first time that system prompts can be leaked by current models.

So the non-experts that are launching products discover prompt injection, and then Lakera swoops in and says they have a solution. Sure, they don't outright say that the solution is 100% effective. But they also don't make a strong point to say that it's not; and people's instincts about how security works fill in the gaps in their head.

People don't have the context or the experience to know that Lakera's "solution" is actually a probabilistic model and that it should not be used for serious security purposes. In fact, Lakera's product would be insufficient for Google to use in this exact situation. It's not appropriate for Lakera to recommend its own product for a use-case that its product shouldn't be used for. And I do read their comment as suggesting that Lakera AI's product is applicable to this specific Bard attack.

Should we be comfortable with a company coming into a thread about a security vulnerability and pitching a product that is not intended to be used for that class of security vulnerability? I think the responsible thing for them to do is at least point out that their product is intended to address a different kind of problem entirely.

A probabilistic external classifier is not sufficient to defend against data exfiltration and should not be advertised as a tool to guard against data exfiltration. It should only be advertised to defend against attacks where a 100% defense is not a requirement -- tasks like moderation, anti-spam, abuse detection, etc... But I don't think that most readers know that about injection classifiers, and I don't think Lakera AI is particularly eager to get people to understand that. For a company that has gone to great lengths to teach people about the potential dangers of prompt injection in general, that educational effort stops when it gets to the most important fact about prompt injection: that we do not (as of now) know how to securely and reliably defend against it.


On your first point, I must disagree. The word “prevent” would be used to indicate 100%, well, prevention. You “catch” something you’re hunting for and hunts aren’t always successful. A spam filter “catches” spam, nobody expects it to catch 100% of spam.


How can you provide assurance that that there are no false positives or negatives? XSS detection was a thing that people attempted and it failed miserably because you need it to work correctly 100% of the time for it to be useful. Said another way, what customer needs and is willing to pay for prompt injection protection but has some tolerance for error?


Good point (not sarcastically). What customer needs and is willing to pay for an antivirus that has some tolerance for error?


every current antivirus software has some false positives and some false negatives, that's why sites like virustotal exist. i don't see how this is any different


If an application like `su` had a privilege escalation bug and someone came on HN and suggested that you could use antivirus to solve the issue by detecting programs that were going to abuse `su`, they would be rightly downvoted off the page.

The short answer is that in some ways, Lakera's product is actually very similar to antivirus, in the sense that both Lakera's product and antivirus will have false positives and will miss some attacks. Both Lakera's classifier and an antivirus program are similarly inappropriate to suggest as a solution for security-critical applications.

That doesn't mean they're useless, but they're not really applicable to security problems that require fully reliable and consistent mitigations.


Late reply to this -https://news.ycombinator.com/item?id=38233029

But yeah we agree that GPT isn't necessarily doing things like how a human does and that it doesn't necessarily understand things as well as a human.

I guess I just primarily took issue on the use of "Understanding". Understanding is a spectrum, not binary.

In school, in the workplace or whatever, there's a big range of performance and capability even in the range we confess understanding to. We say that both the C and A student(and everyone in-between) have understanding of the material, at least enough to be useful for that domain.

So what can I say, I use the same standard with the machines. It understands chess now, even if not perfectly.


Discussion on the original Galdalf prompt injection game: https://news.ycombinator.com/item?id=35905876


This video gives a great summary on comparing classic SWE and an experiment-centric data/ML engineer roles.


Have you seen Python Interactive? There are no separate editors for each cell, but you edit a single file and use #%% to define the boundaries between cells. https://code.visualstudio.com/docs/python/jupyter-support-py


My son (4) loves that too, but his brothers didn't care at the same age. It depends so much on the personality. I also got a little bit addicted to it recently ;-)


> I also got a little bit addicted to it recently ;-)

I can relate to this! For my 22nd birthday my parents to me one of these because I constantly tried to solve my cousins. Trying to solve the whole thing is easily one of the most addictive yet infuriating non-digital toys I have ever played with. Still who’s using it more than a year later!


Thank you. This was exactly what I've been looking for for a while now.


I love this tool! I'm currently experiencing issues with the pager in the gnome-terminal, but I'm guessing the build in pager is quite a new functionality so I'm patient. Thanks!


That tool lets you describe a context-free grammar for the command line options, and transforms the options into gui elements. Does not require KDE anymore, compiles with Qt only. http://kaptain.sourceforge.net/


That looks useful, particularly the supplied find and grep "grammar scripts". I installed Kaptain from the trusty/universe default Ubuntu repos and got the latest version. Only issue so far is that it installs to /usr/bin/kaptain and so the scripts need to be modified slightly (they expect bin in /usr/local/bin/).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: