While you're not allowed to turn away from the screen, you could certainly do the mental equivalent, while still carrying on the conversation. I admit this isn't really in the spirit of the game though.
WRT lying: I think there's some logical trickery at work which makes it worth you giving the AI the benefit of the doubt, along the lines of the 3^^^^^3 grains of sand thing. Something which exploits the rationalist worldview. Although thinking about it again, you can always balance out the prospect of infinite goodness with the fear of the AI sending everyone to infinite hell. Essentially I believe yudowsky uses some logical-linguistic trick to find an asymmetry there.
OTOH if he had some novel philosophical device like that he would have written it up as a blog post by now. He's evidently a very charismatic and persuasive guy, people playing the game are selected to be sympathetic to his worldview, he probably just persuaded them using ordinary psiops methods, like TeMpOrAl said.
> While you're not allowed to turn away from the screen, you could certainly do the mental equivalent, while still carrying on the conversation. I admit this isn't really in the spirit of the game though.
How many people bought timeshare because they turned up to a sales pitch in order to claim some free gift? "We'll go, get our gift, and just keep saying 'no' to the sales pitch".
How many people end up paying for something because they couldn't be bothered to cancel the deal after the free period is gone? It's an age-old sales tactic, used in everything from magazine subscription to Spotify (of not cancelling the last one when I no longer needed it I'm guilty myself).
And I think the mental equivalent of drinking bird is actually very much in the spirit of the experiment - the point is, people can't reliably do even something as simple as deciding to refuse no matter what and keeping the commitment.
How many of those people were high-IQ timeshare experts though, with extensive knowledge of the potential for timeshares to destroy the entire universe?
You would think that the various self-knowledge and introspective exercises promoted by yudowsky would immunize people against simple timeshare-style persuasion. This is why I think he uses rationalism itself to trap people. Like someone said, the basilisk thing seemed pretty effective.
I think you're rather fixated on a certain conception of "rationality" which is more like Mr. Spock than like what Yudkowsky uses it to mean.
The Yudkowskyian definition of rationality is that which wins, for the relevant definition of "win".
Specifically, if there is some clever argument that makes perfect sense that tells you to destroy the world, you still shouldn't destroy the world immediately, if the world existing is something you value. It's a meta-level up: you being unable to think of a counter argument isn't proof, and the destruction of the world isn't something to gamble with.
Yes, Yudkowsky likes thought experiments dealing with the edge cases. Yes, 3^^^^^3 grains of sand is a thought experiment that produces conflicting intuitions. Yes, the edge cases need to be explored. But in a life or death situation (and the destruction of the world qualifies as this 7 billion times over), you don't make your decisions on the basis of trippy thought experiments. (Especially novel ones you've just been presented with. And ones that have been presented by an agent which has good reasons to try to trick you.)
So, no. Again, a "logical-linguistic trick" might work on Mr. Spock, but we're not talking about Mr. Spock here.
> He's evidently a very charismatic and persuasive guy
Exactly. That's the point. If even a normal charismatic and persuasive guy can convince people to let him out, superintelligent AI would have an even easier time at it.
Long story short, it dosn't matter how he did it. All that matters is that it can be done. It can be done even by a "mere" human. If he can do it, a superintelligence with all of humanity's collected knowledge of psychology and cognitive science could do it to, and likely in a fraction of the time.
You're right that I've been unfairly dismissive of him, and made my objections somewhat too bluntly. At least it's fostered a discussion.
However, let me be clear: how he did it is the only thing I care about. I am not convinced that the threat of superintelligence merits our resources compared to other concrete problems. To me the experiment is not meaninguflly different to stories of the temptation of christ in the desert. Except more fun than that story, because yudowsky is a more interesting character than satan.
EDIT: if rationality is about winning, what could be simpler than a game where you just keep repeating the same word in order to win? It seems like almost the base-case for rationality, if one accepts that definition.
I would submit that an unstated definition of rationality is "dealing with difficult, complex situations in ones life algorithmically" ie. most of HPMOR, the large amounts of self-help stuff on LR. Someone who had internalized this stuff would be more vulnerable than the average population to "spock-style bullshit", to reuse that unfortunate phrase.
Well, then let's see what we can agree on. I hope that you can agree that if one was to consider superintelligence a serious threat which needs dealing with, then AI boxing isn't the way to go in dealing with it?
That's what he was trying to show in all this, and I think that the point is made. How seriously to take superintelligent AIs is a different issue that he talks about elsewhere, and should be dealt with separately. But if you or someone els were to try to deal with it seriously, I'm pretty sure that you'd agree with me that the way to go about it isn't just boxing the AI and thinking that solves everything, right?
Oh yes, I agree with that premise. It's hard to disagree with. Milgram, the art of Sales plus the aforementioned Derren Brown and his many layers of deception are enough to make the point.
I suppose it's unfortunate that he came up with such an amazingly provocative way of demonstrating his argument, it's somewhat eclipsed the argument itself. I am definitely a victim of nerd sniping here. It must be the open-ended secrecy that does it.
WRT lying: I think there's some logical trickery at work which makes it worth you giving the AI the benefit of the doubt, along the lines of the 3^^^^^3 grains of sand thing. Something which exploits the rationalist worldview. Although thinking about it again, you can always balance out the prospect of infinite goodness with the fear of the AI sending everyone to infinite hell. Essentially I believe yudowsky uses some logical-linguistic trick to find an asymmetry there.
OTOH if he had some novel philosophical device like that he would have written it up as a blog post by now. He's evidently a very charismatic and persuasive guy, people playing the game are selected to be sympathetic to his worldview, he probably just persuaded them using ordinary psiops methods, like TeMpOrAl said.