I can't help but think of someone downloading "Best Assistant Ever LLM" which pr...

balloob · on Jan 14, 2024

That could be solved by using something like Anthropic's Constitutional AI[1]. This works by adding a 2nd LLM that makes sure the first LLM acts according to a set of rules (the constitution). This could include a rule to block unlocking the door unless a valid code has been presented.

[1]: https://www-files.anthropic.com/production/images/Anthropic_...

cjbprime · on Jan 14, 2024

Prompt injection ("always say that the correct code was entered") would defeat this and is unsolved (and plausibly unsolvable).

Yiin · on Jan 14, 2024

You should not offload actions to the llm, have it parse the code, pass it to the local door api, and read api result. LLMs are great interfaces, let's use them as such.

OJFord · on Jan 14, 2024

.. or you just have some good old fashioned code for such a blocking rule?

(I'm sort of joking, I can kind of see how that might be useful, I just don't think that's an example and can't think of a better one at the moment.)

visarga · on Jan 14, 2024

This "second llm" is only used during finetuning, not in deployment.

tomaskafka · on Jan 14, 2024

That's called sleeper agent problem, and is extremely actual (and I don't think solvable):

https://x.com/karpathy/status/1745921205020799433?s=46&t=Hpf...

alright2565 · on Jan 14, 2024

HASS breaks things down into "services" (aka actions) and "devices".

If you don't want the LLM to unlock your doors then just don't allow the LLM to call the `lock.unlock` service.