This is a pretty fluffy piece. I'm not trained or versed in the domain, but ML m...

layer8 · on April 26, 2022

It’s about the fact that those adversarial inputs can be designed in by whoever creates the model without the existence of those inputs being detectable (within reasonable computational bounds) by analyzing the model. Moreover, apparently any input can be slightly tweaked to become such an adversarial input, if you know the right key. That means that the model can be made to “lie” on roughly any input, without that fact being detectable on the model.

blueblob · on April 26, 2022

Why is that interesting though? I can just as easily put a backdoor in preprocessing before passing it to the algorithm. Outside of machine learning, you can do the same thing anywhere. This doesn't appear to be anything new, it's citing an article that's not even peer reviewed yet. It's just not good writing, in my opinion.

layer8 · on April 26, 2022

It’s interesting if someone supplies you a model which you build an application around yourself (and thus control any preprocessing), because they basically prove that you have no way to check that the model doesn’t contain any backdoors, even though you can inspect the model (it’s not a black box to you). It’s as if someone gives you an software component as source code but you still can’t detect that it has a backdoor.

wahern · on April 26, 2022

How is this different from the halting problem?

layer8 · on April 26, 2022

ML models aren’t turing machines (unless you loop their output back as input). The paper is about simple classifiers, which run in a predetermined, finite number of steps.

magicalhippo · on April 26, 2022

But it's similar to using a compiler, no?

I almost never compile the compiler I use, so I'm implicitly trusting that the compiler actually spits out what I expect and not some kind of backdoor[1].

[1]: https://dl.acm.org/doi/10.1145/358198.358210

layer8 · on April 26, 2022

What exactly corresponds to the compiler and its input/output in your analogy? It doesn’t seem very similar.

magicalhippo · on April 26, 2022

I guess I misunderstood the context.

I thought the issue was that you get some premade model from a company, feed it input and it classifies for you. With a compiler you feed it input and it produces a binary.

If you don't have access to the source, meaning model training data or source code for the compiler, then you can't be sure the model won't intentionally misclassify or the compiler won't insert trojan code.

But I see now the op meant something different.

layer8 · on April 26, 2022

The difference I see is that an ML model is at first glance not a compiled binary with hidden mechanics: It’s a network graph with weights on the edges and where all nodes work in the same easy-to-understand way. The model also isn’t a unique function of the training data in the way that the compiler binary is a function of the compiler source — you can get slightly differently behaving models from the same training data, so you can’t totally predict the model’s behavior from the training data like you can predict the compiler’s behavior from the compiler source. The model itself is generally the better “source” for predicting (well, simulating) its exact behavior. That’s why it is surprising that the presence of a backdoor can remain undetectable by inspecting the model. There would be somewhat of an analogy if there was a backdoored compiler where the backdoor cannot be detected by analyzing the compiler binary’s machine code.

Brian_K_White · on April 26, 2022

I agree this is completely unremarkable.

What's remarkable is that anyone thinks it's remarkable that a machine, or a person for that matter, or a person operating a machine, can be wrong.

A person can give a wrong answer or perform a wrong action, as a result of bad input. So what? That input can be crafted specifically to confuse them and trick an honest person into performing some bad act. So what?

Alk the same is exactly the same true for an ai. So what?

And lastly, aside from a person or ai being in error, an operator/user of an ai (or person) can be in error (believing the ai's output is good when it's not). So what?

None of this is the slightest bit remarkable.

Georgelemental · on April 26, 2022

The novel result is not "code can be wrong," it's " code can be wrong in a way that cannot be detected via any sort of audit or review, even when said code is restricted to some class less complex than Turing machines."

layer8 · on April 26, 2022

What’s remarkable is that you can inspect all the details of the machinery (ML model) and still can’t detect that it contains a backdoor.

Brian_K_White · on April 26, 2022

I thought that was always true of any ai? You only know the input data, weights, and starting conditions/code, but know nothing about the actual workings once started.

You can only audit that by duplicating the results, corroboration, and consensus, like with scientific research. IE, other ais doing the same job but using other code and run by other people, do they produce the same output, or the same pattern of output.

I'm not in ml/ai so I'm not stating that as something I know, just something I always assumed.

I would be stunned if you said that people actually thought they could audit ai inner workings after kick-off.

layer8 · on April 26, 2022

Spot-testing usually gives you a representative picture of what the ML model will produce in general. Of course there can always be outliers (and usually there are), but they are just that, outliers, and they can’t be systematically exploited by an attacker with normal-looking inputs. The present paper however basically shows that those outliers can be systematically and deliberately spread throughout input space in such a way that any given input can be slightly tweaked by the attacker (in ways that the input still looks unsuspicious) to get the desired “lying” output, without that fact being detectable either by spot-checking or any other practically feasible analysis on the model. The fact that this is possible to do in such a general fashion (any given model can be modified to contain such a backdoor) is a new finding.

Brian_K_White · on April 29, 2022

That is interesting. Thank you.

jcims · on April 26, 2022

One thing I’ve always wondered is what would happen if, for example, every Tesla driver in a neighborhood agreed to run a very specific stop sign every single time.

taneq · on April 26, 2022

"Truth" is a pretty nebulous concept at the best of times anyway. Humans don't generally know the "truth", they just have a best-guess hypothesis based on their experience so far.

Philosophy is interesting and all but ultimately it's all just linear (or not-so-linear) algebra.