"No, I will *not* tell you how I did it. Learn to respect the unknown unknowns."

Avshalom · on July 21, 2015

Which, given his general mission of making sure hostile AI DOESN'T take over the world is a bit self defeating. The easiest way to inoculate yourself against a persuasive technique is to be aware of it ahead of time. If you want to keep an AI in the box you should absolutely release every successful log.

darkmighty · on July 21, 2015

No, the idea is that the AI box is fundamentally flawed. I believe he defends engineering the AI with fundamental safety, s.t. no box is required.

Personally, I think we'd need a much more intelligent and complex AI for the capability of breaking free of the box and even possessing the "desire" than we're getting for the foreseeable future (it considers a motivated AI of almost limitless knowledge about the world and cleverness), so this thought experiment may not be so relevant. I agree with him the boxing approach is not a robust one though.

philh · on July 21, 2015

> The easiest way to inoculate yourself against a persuasive technique is to be aware of it ahead of time.

Or you can avoid being exposed to it. If you think you know all the techniques an AI might use against you, you're less likely to do that.

The point of the experiment isn't "let's work out how an AI might try to persuade us to let it out". It's "even a human intelligence can persuade people who think they could never be persuaded, do you really trust yourself to do better against a superhuman one?"

If you don't know why the gatekeeper failed, it's harder to come up with bullshit reasons why you would have succeeded in that position.

facepalm · on July 21, 2015

Not necessarily. As a lighter example, would it be beneficial to give a mass murderer in jail a communication channel to the outside world? What if he used it to publish the message "I'll give 10 Million Dollars to anybody who breaks me out of jail" or something more sinister?

Edit: huh, downvotes? Yudorowski thinks there are certain things that AIs could say that should not be known. I think that is why he doesn't want to publish the dialogues, because it would give the AI a public communications channel. While the AI is fictional, it could talk about a hypothetical future real self... Instead of promising something to get it out of jail, the fictional AI could say something to make you make it real. Anyway - if it is over your head, fine, but why downvote just because you don't understand something?

Edit2: Sometimes I wonder if already have my personal Hacker News AI that automatically downvotes everything I write...

cousin_it · on July 21, 2015

The AI won't be limited to techniques that you could think of, or techniques that Eliezer could think of. So you'd only get a false sense of security.

Besides, releasing a successful log might be a bad idea for other reasons. Think about how you'd play this game as an AI. You wouldn't go looking for a general purpose mindfuck, because there's probably no such thing. Instead, you would probably spend about a month gathering real life information about the gatekeeper's history, family, weaknesses etc. You'd read books on manipulation and sales techniques, and pick the strongest ones that you can find. You would brainstorm possible tactics and run tests. At the end of the month you'd have a 4 hour script with all possible unfair moves you could use against that person, arranged in the most effective order. (That's why it's a bad idea to play this game with friends.) Do you really want that information to be released? And if you know ahead of time that it will be released, won't it limit your efficiency?

ac-x · on July 21, 2015

So you reckon as the AI player he blackmailed the gatekeeper player? "Let me out or I'll tell your friends/family/co-workers x about you" type of thing?

cousin_it · on July 21, 2015

It's more about finding buttons to push. For example, Justin Corwin won one of his games against a religious woman by telling her that she shouldn't play God by keeping him locked up for a subjective eternity (it was more involved, but you get the point). You could come up with other tactics if you know the gatekeeper is divorced, or donates to charity, or is an immigrant, etc. Really, you'll be surprised by how much progress you can make on an "impossible" problem if you just spend five minutes thinking without flinching away.

avereveard · on July 21, 2015

"Homeopathy works. Learn to respect the unknown unknown"

anfractuosity · on July 21, 2015

Personally I think it's highly suspicious he hasn't released the logs, maybe he did break the rules, who knows.

_glsb · on July 21, 2015

Isn't it by definition impossible to respect the unknown unknown on the account of it being unknown?