Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is no such thing as "safe" data in context of a general system, not in a black-or-white sense. There's only degrees of safety, and a question how much we're willing to spend - in terms of effort, money, or sacrifices in system capabilities - on securing the system, before it stops being worth it, vs. how much an attacker might be willing to spend to compromise it. That is, it turns into regular, physical world security problem.

Discouraging people from anthropomorphizing computer systems, while generally sound, is doing a number on everyone in this particular case. For questions of security, by far one of the better ways of thinking about systems designed to be general, such as LLMs, is by assuming they're human. Not any human you know, but a random stranger from a foreign land. You've seen their capabilities, but you know very little about their personal goals, their values and allegiances, nor you really know how credulous they are, or what kind of persuasion they may be susceptible to.

Put a human like that in place of the LLM, and consider its interactions with its users (clients), the vendor hosting it (i.e. its boss) and the company that produced it (i.e. its abusive parents / unhinged scientists, experimenting on their children). With tools calling to external services (with or without MLP), you also add third parties to the mix. Look at this situation through regular organizational security lens, consider principal/agent problem - and then consider what kind of measures we normally apply to keep a system like this working reliably-ish, and how do those measures work, and then you'll have a clear picture of what we're dealing with when introducing an LLM to a computer system.

No, this isn't a long way of saying "give up, nothing works" - but most of the measures we use to keep humans in check don't apply to LLMs (on the other hand, unlike with humans, we can legally lobotomize LLMs and even make control systems operating directly on their neural structure). Prompt injection, being equivalent to social engineering, will always be a problem.

Some mitigations that work are:

1) not giving the LLM power it could potentially abuse in the first place (not applicable to MLP problem), and

2) preventing the parties it interacts with from trying to exploit it, which is done through social and legal punitive measures, and keeping the risky actors away.

There are probably more we can come up with, but the important part, designing secure systems involving LLMs is like securing systems involving people, not like securing systems made purely of classical software components.



Are you generating these replies with an LLM?

Edit: My apologies then.


God no. I know I sometimes get verbose, especially when sunk cost fallacy kicks in, and I do use LLMs for researching things, but I'm not yet so desperate to have them formulate my own thoughts for me.

The act of writing a comment on HN forces me to think through the opinions and beliefs in it, which is extremely valuable to me :). Half the time, I realize partway through that I'm wrong, and close the window instead of submitting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: