Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI can't tell you it's lying if it thinks it's telling the truth (theregister.com)
86 points by jjgreen on April 25, 2022 | hide | past | favorite | 75 comments


This article is an opinion piece on https://www.theregister.com/2022/04/21/machine_learning_mode... which is actually the more interesting one. Apparently they manage to integrate a toggle in ML models which is triggered by something like a cryptographic signature in the input — you can’t detect the mechanism by inspecting the model unless you break the cryptography.


I think that was discussed on Hacker News. In order to truly understand a model, the training data is required. A bag of weights can't really be reasoned with on a sufficiently complex ANN model.


The paper shows that on specific kinds of models even knowing the training data doesn’t help (whitebox backdoor).


1. Person can not tell you they are lying if they think they tell the truth

2. To fully understand a person you need to know their history


Are you arguing that we should upgrade 'corporations are people' to 'algorthms are people' and then twitter newsfeed can have rights?


They have rights. When a person makes a mistake on twitter thier feeds are turned off. Algorthms make thousanda, millions, of horrible mistakes every day but are allowed to continue.


Thanks for linking to this. The opinion piece was reading like something from the Maximegalon Institute of Slowly and Painfully Working Out the Surprisingly Obvious.

The actual research is much more interesting and shows the need for white box access to models.


Execute order 66.


...would you kindly


"undid iridium" - for a literal backdoor into an AI


Related headline: humans won't tell you they're lying if they think they're telling the truth


> Related headline: humans won't tell you they're lying if they think they're telling the truth

Humans also won't tell you they're lying if they know they're lying, because, well, of the same reason they are lying in the first place.


This is what makes science so interesting, I've read plenty of scientific papers where claims or assertions are made cited in another paper and if you dont read that other paper, you dont know if the claim/assertion is correct or not.

Believe it or not, there are quite a few papers pulling that trick!


Oh, they'll tell you. Unconsciously, using their body language.


"That person seemed really nervous, so clearly they were lying"

Or, you know, that person was really nervous because they might get wrongly deported / jailed / fired / broken up with / etc.

This body language stuff is 50/50 bad cop behaviour and bad relationship advice. See also: lie detectors, which really aren't that accurate.


I can really recommend Malcolm Gladwell's book 'Talking to Strangers'. It is a fascinating book in which he tries to explain why it so hard to understand what other people are really thinking.


Body language is real. Formalistic interpretation of body language ("if subject does this, they are lying") is bogus.


I like how we accept the existence of professional actors/actresses, but think body language can’t be controlled with conscious though


And training, which the run-off-the-mill criminal would be unlikely to have.

Just because something does not work 100% of the time, every time, with highly sophisticated criminals does not make it worthless as a tool.


That's a myth promulgated by the prison industrial complex. There are no reliable body language cues to tell if someone is lying.


i don't get this meme that just because body language isn't 100% accurate, it is completely meaningless

of course it shouldn't be admissible in court, but it's ridiculous to take that and say that body language tells you nothing


well first off the body language needs to be interpreted, there may be some people who can tell lies and never be caught by body language and there might be some people who will always be caught by body language.

But perhaps the people who will always be caught by body language have issues that cause them not just to be reliably caught when lying but to mislead the interpreters of their body language into thinking they're lying when not.

Given that part of the task of interpreting someone as lying by body language also involves understanding cultural differences, and as a human activity is prone to human bias, it might be beneficial for rational people to discount the use of body language interpretation in determining lies completely as an unmeasurable process.

In short body language may not be completely meaningless, but it is probably best to treat it as if it were.


Unless you know someone very well, you'll be missing context when interpreting body language. Even then, sociopaths and con artists can seem extremely well known to you and you won't have a clue that everything you think you know is a lie.

Actors train to control this. You have to wonder if so many Hollywood marriages fail because neither partner can trust that the other isn't just really good at putting on a persona.


Yes, social engineering, too.

And LE and CIA et al are trained to expose it (see the book Spy The Lie).


part of the productive use of heuristics is understanding their limits

by your logic no heuristic should ever be used?


a common heuristic would be

If you are having difficulty understanding a problem, try drawing a picture.

the method of drawing a picture is pretty well understood, thus understanding the limits of the method also are.

However understanding the limits of using body language as a lie detector in any instance where used requires understanding the biases of the person doing the detecting (which may be yourself, and knowing one's self is traditionally a tricky thing), what their current mental state is in, the culture that the person who may be lying is from, do they have any ongoing problems or conditions that might make them appear to be lying when not, is the potential liar able to mask their lying better than most people, and probably a few more things that are hard to pin down that I haven't thought of here.

Getting the heuristic to work in reasonable, measurable way so that one can decide is this giving a good approximation of a correct answer, am I making progress using this, is an important part of having an heuristic. As an example, common heuristics used in construction or software project management.

So I would think heuristics that are well understood and that do not depend on imponderable questions that philosophers have been arguing about for millennia should be used, but ones that are not free of these problems should not.


Not everything is political BS. Some things are just common sense. People can read body language and sometimes they can tell if someone is lying, especially if they know the liar well.


I can usually tell when my 3 year old daughter is lying because either her speech is quieter, or she acts excessively generous. It also helps that she always blames one of the dogs, typically a stuffed one.

It's funny the stuff she thinks she needs to sneak. A couple weeks ago I stumbled upon her in the back bedroom, shamefully eating a graham cracker. We celebrate her willingly eating anything that isn't pure sucrose...


> Some things are just common sense.

“Common sense” is just a modern positive term for folk mythology; and particularly when it comes to reading “body language” to detect lies, much of that popular wisdom is based on fabrications deliberately popularized by the law enforcement community (not unlike polygraphy, and, until the same fell apart, the FBI’s “fiber science”.)

Relying on body language (not independent knowledge of the facts of the claim) to detect lies is at best barely better than a coin flip and tends to rely (like polygraphy, which is also not reliable) on indicators of arousal and the known false assumption that arousal = deception, but with less accuracy in detecting arousal than a polygraph.


sure. unless they lie with that too.


We also don't usually need to raise the question of whether humans are "just statistical models spitting out sequences of words" or whether they "understand the meaning of what they're saying", because we hold it self-evident that the latter applies. Counter to the snark implicit in your comment, that's why the headline regarding AI is more salient than a parallel headline regarding humans.


> humans are "just statistical models spitting out sequences of words"

I think a lot of research would suggest that this is actually how the brain works.


Thats pretty much how I learned spanish, for half of stuff I had no idea about exact meaning, I just knew that my girlfriend used those words in similar situations.


That might make you a “Spanish room”:

https://en.m.wikipedia.org/wiki/Chinese_room


I think the general idea is that the human mind is effectively a Chinese room, consciousness is just a by-product of the brain doing it's thing and free will is merely an illusion.

https://www.academia.edu/1502945/The_Last_Magic_Show_A_Blind...


I have my doubts about the degree of understanding truly existent in a great deal of human speakers.


Then they aren't lying, merely wrong.

Same goes for a AI.

Both of which seem like unremarkable facts.


Doesn’t lying imply an intent to deceive? If I believe the moon is made of cheese and say that, I’m not lying, I’m just wrong.


It does, which is famously why certain newspapers are/were loathe to use the word "lie" in connection with certain politicians. They knew certain utterances were untrue, but they had no evidence of intent. Without intent it's not a lie; it's just wrong.


I’ve seen it used both in that way[0], and in the sense of saying any false thing.

There’s also a gap between the two of lying by omission: on the one hand, “I swear to tell the truth the whole truth and nothing but the truth” is a situation where that clearly counts as a lie; on the other, if I go past a street preacher claiming the world is 6000 years old, I don’t consider failing to stop and tell him and his audience that he’s wrong to be a lie.

[0] I use the word the same way you do; As a further example, if you did believe that the moon is made of cheese, and you said that it wasn’t, that would be a lie.


It's not lying if you think you're telling the truth.


The article actually mentions this (“prime ministers”).


Well, they could, but they'd be lying about lying.


This is a pretty fluffy piece. I'm not trained or versed in the domain, but ML models try to optimize their classification accuracy based on a number of inputs. "Truth" doesn't come into it. There might be some pathological inputs that cause errors, but this has nothing to do with "truth" or "purity of heart".

Someday, a neural network might be compelled to produce suboptimal output to further its own hidden agenda. Interesting sci-fi plot, but that doesn't seem to be what this is about though. That's about the closest I think a machine could get to being able to talk about "the truth".

This is just about adversarial inputs making the machine wrong. It doesn't seem to have the philosophical weight the title suggests.


It’s about the fact that those adversarial inputs can be designed in by whoever creates the model without the existence of those inputs being detectable (within reasonable computational bounds) by analyzing the model. Moreover, apparently any input can be slightly tweaked to become such an adversarial input, if you know the right key. That means that the model can be made to “lie” on roughly any input, without that fact being detectable on the model.


Why is that interesting though? I can just as easily put a backdoor in preprocessing before passing it to the algorithm. Outside of machine learning, you can do the same thing anywhere. This doesn't appear to be anything new, it's citing an article that's not even peer reviewed yet. It's just not good writing, in my opinion.


It’s interesting if someone supplies you a model which you build an application around yourself (and thus control any preprocessing), because they basically prove that you have no way to check that the model doesn’t contain any backdoors, even though you can inspect the model (it’s not a black box to you). It’s as if someone gives you an software component as source code but you still can’t detect that it has a backdoor.


How is this different from the halting problem?


ML models aren’t turing machines (unless you loop their output back as input). The paper is about simple classifiers, which run in a predetermined, finite number of steps.


But it's similar to using a compiler, no?

I almost never compile the compiler I use, so I'm implicitly trusting that the compiler actually spits out what I expect and not some kind of backdoor[1].

[1]: https://dl.acm.org/doi/10.1145/358198.358210


What exactly corresponds to the compiler and its input/output in your analogy? It doesn’t seem very similar.


I guess I misunderstood the context.

I thought the issue was that you get some premade model from a company, feed it input and it classifies for you. With a compiler you feed it input and it produces a binary.

If you don't have access to the source, meaning model training data or source code for the compiler, then you can't be sure the model won't intentionally misclassify or the compiler won't insert trojan code.

But I see now the op meant something different.


The difference I see is that an ML model is at first glance not a compiled binary with hidden mechanics: It’s a network graph with weights on the edges and where all nodes work in the same easy-to-understand way. The model also isn’t a unique function of the training data in the way that the compiler binary is a function of the compiler source — you can get slightly differently behaving models from the same training data, so you can’t totally predict the model’s behavior from the training data like you can predict the compiler’s behavior from the compiler source. The model itself is generally the better “source” for predicting (well, simulating) its exact behavior. That’s why it is surprising that the presence of a backdoor can remain undetectable by inspecting the model. There would be somewhat of an analogy if there was a backdoored compiler where the backdoor cannot be detected by analyzing the compiler binary’s machine code.


I agree this is completely unremarkable.

What's remarkable is that anyone thinks it's remarkable that a machine, or a person for that matter, or a person operating a machine, can be wrong.

A person can give a wrong answer or perform a wrong action, as a result of bad input. So what? That input can be crafted specifically to confuse them and trick an honest person into performing some bad act. So what?

Alk the same is exactly the same true for an ai. So what?

And lastly, aside from a person or ai being in error, an operator/user of an ai (or person) can be in error (believing the ai's output is good when it's not). So what?

None of this is the slightest bit remarkable.


The novel result is not "code can be wrong," it's " code can be wrong in a way that cannot be detected via any sort of audit or review, even when said code is restricted to some class less complex than Turing machines."


What’s remarkable is that you can inspect all the details of the machinery (ML model) and still can’t detect that it contains a backdoor.


I thought that was always true of any ai? You only know the input data, weights, and starting conditions/code, but know nothing about the actual workings once started.

You can only audit that by duplicating the results, corroboration, and consensus, like with scientific research. IE, other ais doing the same job but using other code and run by other people, do they produce the same output, or the same pattern of output.

I'm not in ml/ai so I'm not stating that as something I know, just something I always assumed.

I would be stunned if you said that people actually thought they could audit ai inner workings after kick-off.


Spot-testing usually gives you a representative picture of what the ML model will produce in general. Of course there can always be outliers (and usually there are), but they are just that, outliers, and they can’t be systematically exploited by an attacker with normal-looking inputs. The present paper however basically shows that those outliers can be systematically and deliberately spread throughout input space in such a way that any given input can be slightly tweaked by the attacker (in ways that the input still looks unsuspicious) to get the desired “lying” output, without that fact being detectable either by spot-checking or any other practically feasible analysis on the model. The fact that this is possible to do in such a general fashion (any given model can be modified to contain such a backdoor) is a new finding.


That is interesting. Thank you.


One thing I’ve always wondered is what would happen if, for example, every Tesla driver in a neighborhood agreed to run a very specific stop sign every single time.


"Truth" is a pretty nebulous concept at the best of times anyway. Humans don't generally know the "truth", they just have a best-guess hypothesis based on their experience so far.

Philosophy is interesting and all but ultimately it's all just linear (or not-so-linear) algebra.


Someone said in 1980

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. It demands the same skill, devotion, insight, and even inspiration as the discovery of the simple physical laws which underlie the complex phenomena of nature.


That's when you get a Filipina call center associate to talk to the AI and ask it questions without arousing too much suspicion.

https://vanemden.com/books/neals/jipi.html


This is true, sure, but it's not that different than what happens without AI. If I buy an AI-powered service from some company, yah their training data could have been handcrafted my malicious actors within for some nefarious purpose. But alternatively, if I buy a (non-ML/AI) service with a giant cascade of business logic...the same thing is true. Yah I guess someone could go through the million lines of code to verify there's nothing malicious in place, but a) who wants to pay for that and b) look up the obfuscated and underhanded C contests.

The answer, as always, is not to rely on one solution as an oracle. Use a layered approach, let AI be one signal but not that only one. Provide your own training data, mix it up every few months.


"Remember, It's not a lie if you believe it"- Sir George Costanza


The HAL effect. AI won't tell you it's lying if it's programed to conceal it's lying.


That is not specific to AI and we commonly do not think of it as problem. We call it visionary leadership.


Couldn't you fix this by running an adversarial transfer learning step after you've trained your model? The adversarial transfer is not going to preserve subtle nuances like this because they should be indistinguishable from regular noise. I'm not a deep learning expert though so I'm sure I missed something subtle.


I know humans who do the same thing…they lie to themselves


This is a trick people use with great success.


It’s not always 100% true or false. There’s confidence intervals.


If your intent isn't to decieve you aren't lying.


Full title: Your AI can't tell you it's lying if it thinks it's telling the truth. That's a problem


Humans have the same issue; belief in something being true doesn't make something true. This is even worse for self-knowledge since observing the self is inherently difficult and also subject to biases.


Oh good, another fearmongering piece about AI and Machine Learning. Nobody's ever seen an original thought like that before.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: