*>Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWron...

TylerJay · on July 22, 2015

> The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite

Well, for an argument to "prove" something, the premises must be true and the reasoning must be valid. No matter how smart you are, you can't "prove" something that is false, so no, I don't think they could. A good 'rationalist' would analyze the arguments based on their merit, and if the reasoning is sound, they shift their belief a bit in that direction. If not, then they don't. Just like a regular person (they just know how to do the analysis formally and know how to spot appeals to human biases and logical fallacies.)

> But then the person knows that the AI is lying to them.

No, they don't. The AI could just as easily be telling the truth. If it makes an argument, you analyze the merit of the argument and consider counterarguments. If it tries to tell you that something is a fact, that's where you treat them as a potentially unreliable source and have to bring the rest of your knowledge to bear, do research, talk to other people, and weigh the evidence to make a judgment when you are uncertain.

> their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.

Wait, what? So does mine, within reason of course, but it's not a 'burden'. It's not like I'm obligated to stop and reexamine my views on religion every time a missionary knocks on my door, and LessWrong-ers are no different. But if you hear a convincing argument for something that runs counter to what you think you know, wouldn't you want to get to the bottom of it and find out the real truth? I would.

From having read LessWrong discussions, I can tell you that people there are in many ways more open to hearing differing viewpoints than your average person, but you're treating it like a mental pathology. They can be just as dismissive of ideas that they have already thought about and deemed to be false or that come from unreliable sources (like a potentially unfriendly AI). Your claim that being a self-proclaimed 'rationalist' introduces an incredibly obvious and easily-exploitable bug into one's decision-making process really smells like a rationalization in support of your initial gut reaction to the experiment: That there has to be a trick to it, and that it wouldn't work on you.

A good rule of thumb when dealing with a complicated problem is this: If a lot of smart people have spent a lot of time trying to figure out a solution and there's no accepted answer, then (1) the first thing that comes to your mind has been thought of before and is probably not the right answer, and (2) the right answer is probably not simple.

But there's an easy way to test this: (1) Sit down for an hour and flesh out your proposed strategy for getting a 'rationalist' to let you out of the box. (2) Go post on LessWrong to find someone to play Gatekeeper for you. I'll moderate. If it works, that's evidence that you're right. If it doesn't work, that's evidence that you're wrong. Iterate for more evidence until you're convinced.

But if the first thing that came to your mind upon reading this was a justification for why you would fail if you tried this ("Oh, well I wouldn't personally able to do it with this strategy, but..." or "Oh, well I'm sure this strategy wouldn't work anymore, but...) then you're already inventing excuses for the way you know it will play out.

I don't know how he did it either. But I do know that I wouldn't bet the human race on anyone's ability to win this game against Yudkowsky, let alone a superintelligent AI.