Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Malicious AI models on Hugging Face backdoor users' machines (bleepingcomputer.com)
70 points by coloneltcb on Feb 29, 2024 | hide | past | favorite | 35 comments


So where is the list of malicious models?

What's the advantage of paying jfrog over using safetensors?

Why is it legal for them to make these claims without providing a discrete list of known malware infected models?

This seems really shady, to me, bordering on an extortion racket. "It'd be a shame if something happened to your computer, but don't worry, you can pay us for protection..."

In addition, where is the HuggingFace report on models known to have been infected? HF has a responsibility to inform people of potential issues if they downloaded something before it was flagged. Maintaining a list of known bad models would be a good idea, I think.


For a long time I thought models were just weights. The first time I heard that Hugging Face models contain executable code was at FOSDEM in Stephen Chin's talk: "Security Starts Within the SBOM".

I think there was also an example where Python code with a try block, when it failed because of a missing dependency did a `sudo apt install` in the catch block.

Not sure how much SBOMs help, but we sure need to get this supply chain mess in order.


This is only true for models that are python pickles. gguf doesn't suffer from this.

The ml community needs to move away from pickle files as soon as possible. Worst idea ever.


WHAT? Python warns you in the most apocalyptic language possible that pickles are dangerous as hell. Are the pickle files buried in the HF models in a non-obvious way? I can’t imagine people would just run random pickles from the internet (but then again, people do run `curl [URL] | bash` all the time…)


I'm not responsible for whatever led the ml community to the widespread use of pickles for weights. It happened regardless of the warnings that pickles are dangerous.

Yes, ml researchers have been running random pickles from the internet for years now.

No they are not buried in a non obvious way. They are at the very top of many models. safetensors and gguf have started to replace this horrible practice, but there are still tons of models that use pickles.


I kind of wish I hadn't just read this. Thanks.


Full disclosure I am head of product at Protect AI. To make this easier for everyone we have an open source tool (friendly licensing) called ModelScan https://github.com/protectai/modelscan/tree/main I wouldn't be shocked if they are using this under the hood, but all the best if they are! For a bit more info on this type of attack: https://protectai.com/blog/announcing-modelscan


Feels like we're back in 1999 when we were downloading random executables from the web to use as screensavers. The AI space is going to be a pretty meaty target for malicious actors, both in terms of vehicles for malware as well as propaganda machines. The next decade is going to be interesting.


It's been the case with npm and the likes for the past decade already, and indeed the past decade has been interesting with respect to the so called “supply-chain attacks”…


I'm surprised that things like this don't happen more often. It seems like uploading malicious binaries to npm or other registries would be fairly easy.

Then again, convincing people to play around with a model might be easier than getting them to use your library in an application.


I've seen an alarming number of Colab notebooks that casually ask for access to your entire Google Drive and then download a few thousand Python packages (and who knows what else) and execute code.


Eh, have you tried vscode lately?

Some of the official extensions install code from random places like that.


maybe it is more like - one million transactions in good faith and then an actor or group does a Bad Faith action.. is the tolerance for 'perfect' in the way ? How can ordinary use be enabled while correctly blocking out malicious use, without crazy and ugly intrusive security engineering ?

put another way, maybe the cost of a few bad faith actors is much higher than anticipated among people doing skilled work? How is that handled in a human way?

I can guess that de-platforming based on whatever is much easier for the server side. This dynamic is how we got to the personal computer "revolution" thirty+ years ago


Model weights are data, they shouldn't contain general purpose code with access to the host environment.

Sure, any library/tool that harnesses models might contain malicious code. The models themselves should not be able to.

This is more like distributing music as .exe files instead of .flac


You'd hope, but a lot of models are still distributed as .pt files, which are Pickle — which means they can contain and execute arbitrary code on load.

Some popular applications (Automatic1111 etc.) take measures against that, but the real fix is to ignore anything that isn't safetensors.


Which, large platforms like Hugging Faces bear the responsibility to do what they can to prohibit python pickles.


I think some of the models require remote code execution to support things like custom tokenizers.


There are action models based on LLMs like Gorilla LLM or autoGPT that take a natural language input and convert that to API calls that could do all sorts of things from sending emails to performing stock trades.


To my knowledge it's still the harnesses that perform those actions, not baked into the weights (other than perhaps special tokens denoting that an action should be taken)


Don't see it mentioned in the post, can any of these problems exist for models in the safetensors format? Can't say I know enough about model serialization to understand exactly how much safer it is.


No, and that's the main reason safetensors was created.

Pickle was created to store Python objects. Safetensors was designed to store (only) weights.


Worthy of note is that HF has a bot that can convert a model to safetensors.


Less of a huggingface, more of a facehugger.


Doesn't help that these AI models are entirely black-boxes and you can't even inspect or interpret them.

This is no different to posting binaries online and telling folks to run them on their machines.


There's a big difference, and there's absolutely a safe way to run models locally. Pytorch files use pickle serialization[0] which is insecure and can allow you to embed arbitrary code. Regular users should not be using that, they should instead be using safetensors or something like gguf which is (more or less) just a set of weights.

A gguf model file can be thought of (at a very high level) as a jpg. We have safe and secure ways of decoding and using a jpg.

The "black box" you refer to is more about not understanding why a model does what it does. We don't know why a particular node in the network has a value that ends up influencing the output. We understand how a deep neural net works.

[0] https://docs.python.org/3/library/pickle.html


> Regular users should not be using that, they should instead be using safetensors or something like gguf which is (more or less) just a set of weights.

Regular users do not know any of this and do not care. All they know is to download .exe and run and never check whatever they are downloading is malicious or not.

> The "black box" you refer to is more about not understanding why a model does what it does.

That is my additional point which makes this situation absolutely even worse.

> We understand how a deep neural net works.

No one does and certainly not even the AI scientists even understand the unpredictable behaviours of these models after training.


I'm sorry I think I didn't explain my point clearly. I'm trying to point out there's a difference between not understanding why a model outputs a specific token, and not understand what the computer is doing under the hood. We understand very well how fed forward networks work computationally, and there's nothing really insecure about that. The insecure problem here is in the serialization format that some of these models are distributed as.

As for my "regular users" comment, I don't think it's hard to imagine a world where users have a trusted program that runs the models. This is how basically all file formats work. Excel was insecure at one point for similar reasons, you could embed malicious code in macros, but today excel spreadsheets can run computations on files downloaded from the web and it's just as secure as open a .txt file.


> I'm trying to point out there's a difference between not understanding why a model outputs a specific token, and not understand what the computer is doing under the hood.

Whenever there is trust involved, there is no difference to your point. How you're running the model requires trusting that the model, parser, etc isn't compromised and especially if it can be trusted to behave correctly after training - thus transparent explanations rather than hallucinations and its easy to trick and compromise them to do something else.

Given that we already don't trust the outputs of these AI models, the above security issue make this even worse and untrustworthy. The plain old regular average joe users do not care about the neural network format, etc and will run and open anything without checking regardless even if it is a text file disguised as a program.

Thus, it is entirely no different and we are back to square -1 (with the unexplainable properties of these models that people will try it out and trust its outputs)


I agree that we can define a safe serialization format for models with given assumptions about architecture. I.e. when the model is just the matrices and cannot supply custom inference code needed to process the matrices.

But, I expect we're going to have additional rounds of insecure practice just like we've had in every other popularized tech movement. People are going to develop frameworks with code-injection flaws, where they assume mode outputs (tokens) can be trusted and can contain executable content. Then, the inscrutable and untrustworthy models are going to be a problem as well.

Think everything from MS Office macro abuse to human drivers blindly following their GPS guidance into a lake. This can and most probabaly will be repeated with AI models, due to the prevalence of naive and over-trusting practitioners and consumers.


100%. Software takes time to make it secure, since after all it’s written by us flawed humans. The runtime that consumes the model files might have bugs, but those will be fixed over time.

How people use model outputs (or inputs; I.e prompt injections) is a whole other area ripe for exploits, especially while these technologies are being adopted by people who don’t really understand them. But I view this as fundamentally different than the above. A model format can be secure in that it won’t just randomly delete files on your computer. This is a computer science problem and thus provably securable. I guess that was the point I was trying to make when differentiating why they work vs how they work.


Would running models in VM mitigate the risk?


Depends, there are VM escapes.


Yeah but defense in depth. Now you have to own the VM and escape the VM, and the latter usually requires an attack against the exact VM you are running which may be hard to determine.


You'd have to do gpu passthrough if you want performance.


I think we need models to safeguard against models and then maybe hard coded rules to safeguard against those models missing out on things. It would be exceedingly hard to verify what all could be maliciously encoded in a billion parameter model waiting to be unleashed with the right sequence of tokens.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: