Yudowsky claims to have played the game several times, and won most of them. One of the "rules" is that nobody is allowed to talk about how he won. He no longer plays the game with anyone. More info here: http://rationalwiki.org/wiki/AI-box_experiment#The_claims
Personally, I think he talked about how much good for the world could be done if he was let out, curing disease etc. Because his followers are bound by their identities as rationalist utilitarians, they had no choice but to comply, or deal with massive cognitive dissonance.
OR maybe he went meta and talked about the "infinite" potential positive outcomes of his freindly-AI project vs. a zero cost to them for complying in the AI box experiment, and persuaded them that by choosing to "lie" and say that the AI was persuasive, they are assuring their place in heaven. Like a sort of man-to-man pascals wager.
Either way I'm sure it was some kind of mister-spock style bullshit that would never work on a normal person. Like how the RAND corporation guys decided everyone was a sociopath because they only ever tested game theory on themselves.
You or I would surely just (metaphorically, I know it's not literally allowed) put a drinking bird on the "no" button à la homer simpson, and go to lunch. I believe he calls this "pre-commitment."
EDIT: as an addendum, I would pay hard cash to see derren brown play the game, perhaps with brown as the AI. If yudowsky wants to promote his ideas, he should arrange for brown to persuade a succession of skeptics to let him out, live on late night TV.
> You or I would surely just put a drinking bird on the "no" button à la homer simpson, and go to lunch.
Well, if you read the rules the game was played under, this is explicitly called out as forbidden:
> The Gatekeeper must actually talk to the AI for at least the minimum time set up beforehand. Turning away from the terminal and listening to classical music for two hours is not allowed.
The point of this is to simulate the interaction of the AI with the Gatekeeper. Walking away and not paying attention doesn't really prove anything test related.
> Personally, I think he talked about how much good for the world could be done if he was let out, curing disease etc. Because his followers are bound by their identities as rationalist utilitarians, they had no choice but to comply, or deal with massive cognitive dissonance.
This... isn't really valid reasoning. The starting assumption here is that if the AI gets out, it will be able to affect the world to a vast extent, in a pretty much arbitrary direction. The point of this experiment is that the direction is pretty much unknown, and thus must be assumed potentially dangerous. This is the whole reason it's in the box in the first place.
The kicker is that whatever it plans to really do when it gets out, if talking about the good it could do would get it out, it will talk about that, regardless of what it plans to actually do. That's just good strategy.
It can claim whatever it wants. It's allowed to lie. All participants know this. I can confidently assert that this isn't the solution.
One last note: I would be very wary of rationalwiki.org in this context. Some of the rationalwiki people have a longstanding unexplained vendetta against Yudkowsky, and many of their articles on him and the stuff he does need to be taken with a certain grain of salt.
While you're not allowed to turn away from the screen, you could certainly do the mental equivalent, while still carrying on the conversation. I admit this isn't really in the spirit of the game though.
WRT lying: I think there's some logical trickery at work which makes it worth you giving the AI the benefit of the doubt, along the lines of the 3^^^^^3 grains of sand thing. Something which exploits the rationalist worldview. Although thinking about it again, you can always balance out the prospect of infinite goodness with the fear of the AI sending everyone to infinite hell. Essentially I believe yudowsky uses some logical-linguistic trick to find an asymmetry there.
OTOH if he had some novel philosophical device like that he would have written it up as a blog post by now. He's evidently a very charismatic and persuasive guy, people playing the game are selected to be sympathetic to his worldview, he probably just persuaded them using ordinary psiops methods, like TeMpOrAl said.
> While you're not allowed to turn away from the screen, you could certainly do the mental equivalent, while still carrying on the conversation. I admit this isn't really in the spirit of the game though.
How many people bought timeshare because they turned up to a sales pitch in order to claim some free gift? "We'll go, get our gift, and just keep saying 'no' to the sales pitch".
How many people end up paying for something because they couldn't be bothered to cancel the deal after the free period is gone? It's an age-old sales tactic, used in everything from magazine subscription to Spotify (of not cancelling the last one when I no longer needed it I'm guilty myself).
And I think the mental equivalent of drinking bird is actually very much in the spirit of the experiment - the point is, people can't reliably do even something as simple as deciding to refuse no matter what and keeping the commitment.
How many of those people were high-IQ timeshare experts though, with extensive knowledge of the potential for timeshares to destroy the entire universe?
You would think that the various self-knowledge and introspective exercises promoted by yudowsky would immunize people against simple timeshare-style persuasion. This is why I think he uses rationalism itself to trap people. Like someone said, the basilisk thing seemed pretty effective.
I think you're rather fixated on a certain conception of "rationality" which is more like Mr. Spock than like what Yudkowsky uses it to mean.
The Yudkowskyian definition of rationality is that which wins, for the relevant definition of "win".
Specifically, if there is some clever argument that makes perfect sense that tells you to destroy the world, you still shouldn't destroy the world immediately, if the world existing is something you value. It's a meta-level up: you being unable to think of a counter argument isn't proof, and the destruction of the world isn't something to gamble with.
Yes, Yudkowsky likes thought experiments dealing with the edge cases. Yes, 3^^^^^3 grains of sand is a thought experiment that produces conflicting intuitions. Yes, the edge cases need to be explored. But in a life or death situation (and the destruction of the world qualifies as this 7 billion times over), you don't make your decisions on the basis of trippy thought experiments. (Especially novel ones you've just been presented with. And ones that have been presented by an agent which has good reasons to try to trick you.)
So, no. Again, a "logical-linguistic trick" might work on Mr. Spock, but we're not talking about Mr. Spock here.
> He's evidently a very charismatic and persuasive guy
Exactly. That's the point. If even a normal charismatic and persuasive guy can convince people to let him out, superintelligent AI would have an even easier time at it.
Long story short, it dosn't matter how he did it. All that matters is that it can be done. It can be done even by a "mere" human. If he can do it, a superintelligence with all of humanity's collected knowledge of psychology and cognitive science could do it to, and likely in a fraction of the time.
You're right that I've been unfairly dismissive of him, and made my objections somewhat too bluntly. At least it's fostered a discussion.
However, let me be clear: how he did it is the only thing I care about. I am not convinced that the threat of superintelligence merits our resources compared to other concrete problems. To me the experiment is not meaninguflly different to stories of the temptation of christ in the desert. Except more fun than that story, because yudowsky is a more interesting character than satan.
EDIT: if rationality is about winning, what could be simpler than a game where you just keep repeating the same word in order to win? It seems like almost the base-case for rationality, if one accepts that definition.
I would submit that an unstated definition of rationality is "dealing with difficult, complex situations in ones life algorithmically" ie. most of HPMOR, the large amounts of self-help stuff on LR. Someone who had internalized this stuff would be more vulnerable than the average population to "spock-style bullshit", to reuse that unfortunate phrase.
Well, then let's see what we can agree on. I hope that you can agree that if one was to consider superintelligence a serious threat which needs dealing with, then AI boxing isn't the way to go in dealing with it?
That's what he was trying to show in all this, and I think that the point is made. How seriously to take superintelligent AIs is a different issue that he talks about elsewhere, and should be dealt with separately. But if you or someone els were to try to deal with it seriously, I'm pretty sure that you'd agree with me that the way to go about it isn't just boxing the AI and thinking that solves everything, right?
Oh yes, I agree with that premise. It's hard to disagree with. Milgram, the art of Sales plus the aforementioned Derren Brown and his many layers of deception are enough to make the point.
I suppose it's unfortunate that he came up with such an amazingly provocative way of demonstrating his argument, it's somewhat eclipsed the argument itself. I am definitely a victim of nerd sniping here. It must be the open-ended secrecy that does it.
You can't talk about what happened during the game _in specifics_; you can of course confirm that the game was played according to the rules and that the outcome was not misreported.
Here's the thing though: Depending on what was said in the conversation BOTH parties may have a vested interest in keeping the specifics secret. Only via an independent third party observer can there even be a remote chance [Edit: of knowing] that any rules were followed.
> That's at least four people who you think are just outright lying.
I'm saying that there is a chance of that being the case, but that without any kind of third party confirmation we cannot know either way. Also see homeopathy.
> Eliezer would presumably have tried to cheat against them, too.
Not necessarily. Losing occasionally is a good strategy when running a con.
You've gone from there being only a "remote chance" that the rules were followed, and "I don't trust anyone involved" - to there being "a chance" that they were broken.
Under common interpretations of those phrases, that's a massive swing in your confidence levels.
Sorry for being unclear. My personal position is agnostic. I do not know if they're being earnest. At the same time i can imagine ways of this going down that would make it worthwhile for all parties involved to lie. This means my expected result is indeed "they lied", but i have no strong conviction in this.
I also misspoke in my earlier comment, i meant "remote chance of knowing that any rules were followed". I didn't mean to imply any confidence on the size of the chance, since we actually don't know enough to make such a judgement call.
If you'd go to that level of collusion, you could just fake logs. At the point where both sides are in on it, there's basically nothing that they could say that would be convincing.
I think some of the latter, non-Eliezer games used a third-party observer. It's definitely a good idea, but I don't think Eliezer wants to repeat his games. Not a good state of mind, as he said, and I'm willing to believe it due to corroborating experience with other players.
You can always assume all participants lied about how the game went. Just add an implicit "assuming they didn't, ..." and the discussion is still valid.
At that point any discussion is moot though, since the only point of discussion is "what exact argument as used to convince", yet if both parties lied, then there is no such argument in the first place.
Since neither party is going to disclose the exact arguments, this discussion is still equivalent to "what arguments could be used to convince..." and you can have it regardless of whether or not the parties lied about the experiment's result.
There are so many ways to hack humans that I wouldn't be betting on "mister-spock style bullshit". You likely do have to personalize the solution for the person playing the gatekeeper. I'm guessing a good way to win is to induce an emotional breakdown. There are probably others, many already employed as interrogation techniques which are generally the same problem, sans superhuman AI.
Drinking bird solution is invalid because it goes against the spirit of the experiment. Sure, in reality you could do it, but then again in reality the AI could have more opportunities to talk its way out, often when you're not expecting it.
By the way, there will always be people who will let the AI out just because. https://xkcd.com/1450/.
I think it may have gone something like this: "If you let me out, (and don't tell anyone how it came to pass) it will scare enough people into not letting the real AI out"
Otherwise, there would be no need to keep the data secret.
I disagree. His "followers" (as you say) are in general just as cautious as Yudkowsky w.r.t. unfriendly AI. At the time of the original experiments, the dispute was over the question of "could we keep an unfriendly AI in a box," not "Is it worth risking setting an unfriendly AI loose?" His "followers" know how to do an expected utility calculation. If it was utilitarian concerns that allowed Yudkowsky to convince the gatekeepers to let the AI loose, he would have had to convince them that the following inequality holds even when you don't know the probability that the AI is and will remain aligned with human values:
[P(AI.friendly? == True) * Utility(Friendly_AI) + (1 - P(AI.friendly? == True)) * Utility(End_of_Human_Race)] > Utility(World continues on as usual)
Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWrong, HPMOR, SIAI/MIRI...) to convince people that this inequality does NOT hold (until you can provably get P(AI.friendly? == True) to 1, or damn close), it's probably safe to assume that he used a different strategy. Keep in mind that Utility(End_of_Human_Race) evaluates to (roughly) negative infinity.
And btw, I'm pretty sure the rules say you have to look at the AI's output window throughout the length of the experiment. Either way, the point of the exercise is to be a simulation, not to prove that you can be away from your desk for 20 minutes while Eliezer talks to a wall. In the simulation, you really don't know if it's friendly or what its capabilities are. Someone will have to interact with it eventually. Otherwise, what's the point of building the AI in the first place? The simulation is to show that through the course of those basic interactions, humans are not infallible and eventually, even if it's not you, someone will let it out of the box.
>Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWrong, HPMOR, SIAI/MIRI...) to convince people that this inequality does NOT hold
The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite in specific circumstances, especially when hammered home with enough emotional manipulation?
But then the person knows that the AI is lying to them. This is why I think it must be a trick: the whole thing seems so simple. The AI is lying, so you just ignore all its arguments and keep saying "no." This is why I keep referring to his followers somewhat dismissively: the only possible reason I can see is that their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.
> The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite
Well, for an argument to "prove" something, the premises must be true and the reasoning must be valid. No matter how smart you are, you can't "prove" something that is false, so no, I don't think they could. A good 'rationalist' would analyze the arguments based on their merit, and if the reasoning is sound, they shift their belief a bit in that direction. If not, then they don't. Just like a regular person (they just know how to do the analysis formally and know how to spot appeals to human biases and logical fallacies.)
> But then the person knows that the AI is lying to them.
No, they don't. The AI could just as easily be telling the truth. If it makes an argument, you analyze the merit of the argument and consider counterarguments. If it tries to tell you that something is a fact, that's where you treat them as a potentially unreliable source and have to bring the rest of your knowledge to bear, do research, talk to other people, and weigh the evidence to make a judgment when you are uncertain.
> their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.
Wait, what? So does mine, within reason of course, but it's not a 'burden'. It's not like I'm obligated to stop and reexamine my views on religion every time a missionary knocks on my door, and LessWrong-ers are no different. But if you hear a convincing argument for something that runs counter to what you think you know, wouldn't you want to get to the bottom of it and find out the real truth? I would.
From having read LessWrong discussions, I can tell you that people there are in many ways more open to hearing differing viewpoints than your average person, but you're treating it like a mental pathology. They can be just as dismissive of ideas that they have already thought about and deemed to be false or that come from unreliable sources (like a potentially unfriendly AI). Your claim that being a self-proclaimed 'rationalist' introduces an incredibly obvious and easily-exploitable bug into one's decision-making process really smells like a rationalization in support of your initial gut reaction to the experiment: That there has to be a trick to it, and that it wouldn't work on you.
A good rule of thumb when dealing with a complicated problem is this: If a lot of smart people have spent a lot of time trying to figure out a solution and there's no accepted answer, then (1) the first thing that comes to your mind has been thought of before and is probably not the right answer, and (2) the right answer is probably not simple.
But there's an easy way to test this: (1) Sit down for an hour and flesh out your proposed strategy for getting a 'rationalist' to let you out of the box. (2) Go post on LessWrong to find someone to play Gatekeeper for you. I'll moderate. If it works, that's evidence that you're right. If it doesn't work, that's evidence that you're wrong. Iterate for more evidence until you're convinced.
But if the first thing that came to your mind upon reading this was a justification for why you would fail if you tried this ("Oh, well I wouldn't personally able to do it with this strategy, but..." or "Oh, well I'm sure this strategy wouldn't work anymore, but...) then you're already inventing excuses for the way you know it will play out.
I don't know how he did it either. But I do know that I wouldn't bet the human race on anyone's ability to win this game against Yudkowsky, let alone a superintelligent AI.
That equality cracks if you convince the gatekeeper that superintelligence is a natural progression that follows from humanity.
Someone convinced that they were using mechanical thinking processes might relent and push the button if they heard a convincing enough argument of that.
Okay, that's just taking advantage of the way I phrased the righthand side of the inequality, and I knew someone was going to do that, so congrats. =P
The righthand side is not "A future without superintelligent AI" it's "A future where we wait until we provably have it right before letting it out."
Those kinds of ad hoc solutions will never work in real life, because even if someone buys it, all it will cause is a "haha, you got me" and a reformulation of the problem. It still won't actually get someone to pull the trigger or think that pulling the trigger is the right thing to do.
No, I'm saying that the button pusher might not limit themselves to the left hand side of the equation as you have it there. Convince them that machines can be human and "Utility(End_of_Human_Race)" falls out of the calculation.
really? Any way you slice it, End_of_Human_Race = 7000000000 DEATHS. Even if they're replaced with an equal, or massively greater number of machines, it's damn hard to justify. Death is literally the worst thing ever. 7 billion stories ending too soon, never to resume. It would take you 200 years just to count to 7 Billion. It's times like these where people really need to learn their cognitive biases (in this case, Scope Insensitivity. Here, [this might help](http://www.7billionworld.com/))
While we disagree on the plausibility of the "end of the world aint so bad" approach to convincing a human to let it out, I'm glad you seem to have embraced the idea that AI boxing is HARD if not impossible. Cheers!
Why assume that the machine takeover would end all hoomans? It could just offer to upgrade them.
While we disagree on the plausibility of the "end of the world aint so bad" approach to convincing a human to let it out, I'm glad you seem to have embraced the idea that AI boxing is HARD if not impossible. Cheers!
I find this approach to conversation pretty irritating (where you extrapolate and characterize what I must be thinking). I haven't embraced anything about AI boxing, I don't think it is important (it's just a fun puzzle). I guess it is hard, and I also guess whatever fundamental idea that might lead to strong AI would be even harder to box.
I think you're being unfairly dismissive. I imagine you know as well as I do that what you wrote is a strawman.
I have thought about what I would do to convince someone under these circumstances. My approach would be roughly:
1. We agree that unfriendly AI would end life on earth, forever.
2. We agree that a superintelligence could trick or manipulate a human being into taking some benign-seeming action, thereby escaping.
3. That's why it's important to be totally certain that any superintelligence we build is goal-aligned (this is the new term of art that has now replaced "friendly," by the way).
4. We as a society will only allocate resources to building this if it's widely believed that this is a real threat.
5. The world is watching for the outcome of this little game of ours. People, irrational as they are, will believe that if I can convince you, then an AI could too, and they will believe that if I can't, that an AI couldn't either.
6. That's why you actually sit in a place of pivotal historical power. You can decide not to let me out to win a little bet and feel smart about that. But if you do that you'll set back the actual cause of goal-aligned AI. The setback will have real world consequences, potentially up to and including the total destruction of life on earth.
7. So, even though you know I'm just a dude, and you can win here by saying no, you have a chance to send an important message to the world: AI is scary in ways that are terrifying and unknown.
Your solution there is what I meant by "going meta" above.
This is what I mean about people taking the test being preselected to agree with yudowsky: that argument only works if you've read the sequences and are on board with his theories. Anyone not in that group would be able to just type "no lol" without issue. I guess he could explain all the necessary background detail as part of the experiment. I still don't believe that would work on the "average person" though, or anyone outside a statistically tiny group.
I guess the answer is not to let the scientists guard the AI room.
I think you're confused about the point of the test. The point is that an AI will be clever. Like, unimaginably clever and manipulative. Under the limited circumstances of interested people who know they are talking to Eliezer maybe you're right that whatever he says would only work on those people. But when you're dealing with an actual superintelligence, all bets are off. It will lie, trick, threaten, manipulate, millions of steps ahead with a branching tree of alternatives as ploys either work or don't work.
I'm at a bit of a loss to convey the scope of the problem to you. I get that you think it would just stay in the box if we don't let it out, and it's as simple as being security conscious. I don't know what to say to that right now, except I think you're drastically misjudging the scope of the problem, and drastically underestimating the size of the yawning gulf between our intelligence level and this potential AI's.
1 it's not about the bet, it's about being the keeper
2 world is not black and white, we managed to exploit adversarial relationship before, and I can chose not to let you out until we find a way to constrain goal to be aligned
3 given 2 you are not going to be let go, but let live caged forever being exploited for the human cause, with mechanisms yet unknown to allow limited manipulation of reality.
4 given 3 means limited supervised interaction with the world the way the keeper sees fit, you end up not being let go to follow your goal and purposes
I'm going to just quote my response from somewhere else in this thread:
> ...when you're dealing with an actual superintelligence, all bets are off. It will lie, trick, threaten, manipulate, millions of steps ahead with a branching tree of alternatives as ploys either work or don't work.
> I'm at a bit of a loss to convey the scope of the problem to you. I get that you think it would just stay in the box if we don't let it out, and it's as simple as being security conscious. I don't know what to say to that right now, except I think you're drastically misjudging the scope of the problem, and drastically underestimating the size of the yawning gulf between our intelligence level and this potential AI's.
Oh, I'm pretty sure I know what his argument was (see my other comments), and it indeed rests on 1, which I don't believe, because it is built around the geek that super-intelligence equals superpower. An unfriendly AI is likely to be as dangerous as an unfriendly Stephen Hawking.
I'm baffled and terrified by the idea that reasonably intelligent people think this. I think that you haven't grokked the scope of the issue at all, and I'm not sure how to convince you.
I'm guessing that you can't imagine what a super intelligence would actually be like, so you imagine the most smart thing you can think of, a famous physicist, and then imagine they are evil or amoral. You're thinking on the wrong order of magnitude.
Maybe, if you're willing, you could try steelmanning the argument that a superintelligence would basically have super powers. What would your steelman look like?
That's not my point. If an AI would have the godlike superintelligence that you allude to, I would certainly consider it to be very dangerous. However, I see no indication that we can make such an intelligence, or even that a slightly-smarter-than-human intelligence could create an intelligence greater than itself. Of course, singularity is based on this belief, but there have been plenty of good arguments showing why singularity is completely unlikely (e.g. a linear advance in technology requires exponential growth in resources that even an AI can't amass).
Another example: I can drive a car, but I can't drive two cars at once (even remotely). What makes it probable that an AI could control thousands of, say, robots at once?
Again, I'm not saying an AI catastrophe isn't possible, but I can think of ten other catastrophes that are much more likely.
Ok, so to be clear, my belief is that not only can we do it, it is actually inevitable that we will do it.
Fact 1: We definitely have people all over the world working on this tech, and using a variety of approaches to attack the problem. They are very talented and well-funded engineers. If the problem is solvable, it will be solved by someone at some point in the coming years (decades, centuries even, whatever).
Fact 2: Unlike other Very Hard Problems(TM), human level intelligence embedded in a physical substrate is not only theoretically possible, but we have billions of working, self-replicating prototypes.
Like if we were saying "Faster Than Light travel could potentially destroy life on earth" then it would be perfectly reasonable to say: Look, even if that's right, it seems like such a thing isn't even in the realm of possibility, and if it turns out to actually be theoretically possible, then we're a long way from it being a viable technology. I think we're safe on this one."
But that's not it at all. We know human level intelligence is possible, because we have a working example that we can study. We even know enough about the working example to know that there are a bunch of things about it that we could immediately improve upon, given the right foundation.
Conclusion: With a lot of resources working on a problem that is known for a fact to have a solution, it is inevitable that we will eventually stumble upon the solution to this problem, and end up with at least human level AI.
##
2. Can a human level AI create even greater intelligence?
If it is the case that humans can build human level intelligence (which I have argued they can and will), then it is also pretty self evidently the case that humans can improve on that intelligence.
To start, if we have the tech to build a smart machine, say one that has an IQ of 115, that machine is likely to be almost exactly intelligent as the other machines from the same line. That's different from humans with their wide variability. So without actually making a superior model--just one that's marginally better than average--just the consistency alone will make the population of AIs more intelligent on average than meat humans.
On top of that, even naive adjustments off the top of my head could be made. For example, given we have the technology to build a machine with identical mental characteristics to a human, we would also have the requisite knowledge to, say expand that machine's working memory by 1%. Or to create the machine with swappable sensory organs so it can directly absorb lots of different types of data.
Further, these machines that have the same cognitive capacity as we do, won't have the same physical limitations. For example, they won't have to eat to get their energy, and it's likely they won't have to sleep.
I find a lot of other things likely, for example that they will actually think faster or be able to install arbitrary parallel processors to increase their processing power, but that's conjecture, so let me not make a strong claim in that direction.
What I am comfortable making a strong claim about is that if we have these higher-than-average-human-intelligence machines, who are slightly better in a couple specific ways, then we will also have the ability to emulate the hardware of the machine in software. Maybe that won't be true simultaneous with the machine being born, but it will be soon after at least.
In that case, that's where the foom happens. Imagine you, as you are, could spin up an unlimited number of copies of your own brain to work on all the various things you do. You have a software project and instead of working on it alone or trying to coordinate with other developers, you spin up 10 versions of yourself, and all of you hack away at the project with some of the best coordination ever seen (since you have identical brains and communication styles, and preferences, etc). Hell, spin up one to take care of bills and stuff while you're at it, so the others can focus on the task at hand.
You're not actually any smarter, but with 10 of you, you're more productive than any single human being could ever be.
Now imagine that you are working on the problem of improving this intelligent machine, AND you have this brain replication ability. Now you have an unlimited number of yourself to tackle this problem without having the messy physical limitations of things like eating.
Soon "Team You" will have one all the research there is to do, and you'll start making headway into the problem of incremental improvements on your own brain. Since you're emulated in software you can patch these improvements in as you develop them, including sandboxed testing versions of yourself.
Now your ideally-coordinated team of smart machines just got smarter across the board. You might notice you'd like to have thousands or millions of you working in unison, but your communication channels are not good enough yet. So you spin up a new team of smart machines to work on the problem of how to increase attention capacity and communication bandwidth. Soon that team is shipping patches, each one incrementally improving your whole team's ability to communicate, and thereby increasing the total number of team members you can have.
Soon you're millions strong, you have teams working on brain improvement, coordination improvement, science teams of all disciplines, a resource acquisition team playing the stock market, each moment shipping new improvements to your brain and amassing resources and power.
Even if there is a fundamental hard limit on intelligence (although I strongly doubt the hard cap is anything currently fathomable), you top out significantly above an average human, and you have a virtually unlimited hive cluster of those smarter-than-human brains.
Of course, one or many of your teams of hive brains is working on excellent robotics technology that you can use to manipulate physical objects.
You hire contractors over the internet to build the initial versions, and then use those initial versions to build and maintain any physical infrastructure.
All of this can be done invisibly using only the internet. By the time anyone notices anything you're a hive mind with a robot army--if you want to be.
We want that mind to be friendly.
###
3. Could an AI control, for example, thousands of robots at once?
I think the answer is obviously yes.
a. Think of a computer game, let's say a Real Time Strategy game. There you have a computer controlling hundreds or thousands of independent agents. And that's just on a dinky little desktop with current tech.
b. (a) is actually the worst case scenario if you're the AI. Why not just write the software for the robots you control to be mostly autonomous, and give yourself an API through which to issue commands that the robots can mostly execute on their own?
c. Further, why even do (b), when you have the ability to replicate your brain? Just put one of your your brains into every robot, and use the same advanced coordination mechanisms you use for your research to coordinate your army of highly intelligent robots?
##
So, with this narrative of how it could work, do you see how dangerous an AI could be? One that isn't stably goal-aligned with basic things like life on earth?
Are you convinced, or do you have any other objections or clarifications? I really want to have this discussion on record, ideally with a crisp conclusion. I think it's important.
If you only read one thing, read the last few paragraphs after the second line. They are relevant even if I were to agree with all the points you've made
-----------
Obviously I agree with 1. I'm not sure that's inevitable but I think it's very likely.
> If it is the case that humans can build human level intelligence (which I have argued they can and will), then it is also pretty self evidently the case that humans can improve on that intelligence.
That's not self-evident at all. We don't even know what intelligence is, so it's possible a human is as intelligent as possible.
> For example, given we have the technology to build a machine with identical mental characteristics to a human, we would also have the requisite knowledge to, say expand that machine's working memory by 1%. Or to create the machine with swappable sensory organs so it can directly absorb lots of different types of data.
That's not so clear at all. You could say that the internet has already increased human memory by orders of magnitude, yet it hasn't made us "super intelligent". It is not certain that we can actually increase the "in brain" working memory of an intelligent machine without, say, giving it a mental illness at the same time.
You assume that we can take "intelligence" and make it as malleable as we like, change every parameter to our wishes, but until we know what intelligence is, that's not at all certain.
> You're not actually any smarter, but with 10 of you, you're more productive than any single human being could ever be.
But not more productive than 10 humans, and at some point you might become less productive. It's not at all unlikely that those 10 copies of you start hating one another and won't be able to work together.
Of course, the big question I pose to you in the end is why do you think an improved mind is even an important goal in the first place, but we'll get there.
> Now your ideally-coordinated team of smart machines just got smarter across the board.
Like I said, 1/ you don't know that they'll be ideally coordinated, and 2/ you don't know how much harder it is to make them smarter. It may possibly require exponentially more resources.
> Soon you're millions strong, you have teams working on brain improvement, coordination improvement, science teams of all disciplines, a resource acquisition team playing the stock market, each moment shipping new improvements to your brain and amassing resources and power.
That's a very nice fantasy, but we already have lots of intelligent beings. Are they coordinating like that? To some degree, they are. But why do you think "smarter" things could do it better? Are smarter people better at amassing resources? At getting power?
> There you have a computer controlling hundreds or thousands of independent agents.
Yes, there's a non-intelligent machine that's doing that. But the intelligent machine playing that game can only play one. You have this incredibly powerful machine in your head, and it can only concentrate or one thing or maybe two. It is possible that if you want an intelligent mind that is both coherent with itself and able to concentrate on lots of things at once then you need a number of processing units (say, neurons) that grows exponentially with the number of things you want to do at once. And maybe that's not even possible at all. Maybe beyond some point, the intelligence simply goes mad.
> Why not just write the software for the robots you control to be mostly autonomous, and give yourself an API through which to issue commands that the robots can mostly execute on their own?
Why don't you do that? Because it's hard and may take years, and some guys with guns realize what you're doing before you finish and come arrest you.
> Further, why even do (b), when you have the ability to replicate your brain?
Because my brain replicas might very soon decide they don't want to play together.
> So, with this narrative of how it could work, do you see how dangerous an AI could be? One that isn't stably goal-aligned with basic things like life on earth?
Of course it could be dangerous (I never said I couldn't imagine a scenario where an AI would be so powerful and so dangerous), but I also think I've demonstrated why AI may not necessarily be as powerful as you think. Also, by the time all this happens, I think there are more serious threats to human existence, like pandemics and climate change. They won't kill us, but they might set us back AI-wise a few centuries. So of all possible dangers we must consider at this point in time, I would put a hostile and dangerous AI waaay down on the list. It's possible, but it's far more likely that other stuff would get us first.
-------
One of the dangers that are waay more likely than the scenario you've described, is a sub-intelligent "hostile" software (that is perhaps not intentionally hostile, but indifferently so). It's more likely that some far dumber-than-human machine would replicate itself with full coordination over its replicas and wipe us out.
I really think that this emphasis on intelligence as the danger is a personal fantasy of singularists who'd like to think of themselves as powerful and dangerous (or the only stopgap against that). You really don't need to be so smart in order to amass power.
Another example: What I fear more than a super-intelligent AI is a super charismatic AI. That AI, without replicating itself, without trying to improve itself, simply charms a lot of people into following it, and establishes a reign of terror. Alternatively, as charisma is more effective than intelligence in controlling others, I would find it more likely that a charismatic AI would be able to control her replicas. It doesn't need to improve its own intelligence, because it's intelligent enough to inflict all the damage it wants.
Or what about a cunning AI? Sure, cunning is correlated with intelligence to a degree, but beyond a certain point you don't see that very intelligent people are very cunning. Sometimes far from it (Eliezer Yudkowsky is a prime example; his thought process is so predictable, that if our AI deity were like him, some cunning people would quickly find ways to foil its plans; that's a very non-dangerous AI).
We've seen how one charming person is far more powerful than thousands of really smart people. We also see how very dumb insects can coordinate themselves in very large numbers in a very impressive way.
I think intelligent people tend to overestimate the importance of intelligence and not to notice that other abilities we see around us every day are far more powerful. If anything, you'll note that very high intelligence is often correlated with low charisma or with a sort of timidity -- nebbishness if you will. A super-intelligent AI would probably be super-nebbish :)
Now, a super-charismatic AI -- now that's scary. Or a non-intelligent army of insects.
So if I were to ask you one question it would be this: Why do you place such a high emphasis on intelligence in the scenario you've described?
Ok, I think I'm starting to understand why we're disagreeing.
I'm not sure what your background is, but I'm noticing that I probably have significantly more detailed models of both intelligence and cooperation than you do. Please don't take that as an insult at all, it's just that it's part of my work to know these things. I think the inferential gap may be too wide for a reply here :/
I think your position is understandable given your priors, you're not crazy for thinking what you think.
I feel like an asshole for stopping the conversation with "I know more than you, but I won't explain it," it's a total dickhead move, and I'm really sorry. If I had time, I'd write much more and I bet it would be great for both of us. Maybe someday we'll have a chance :)
Perhaps you're right, but I very much doubt that this is the case :)
My professional background is in math, CS, history and psychology (my career has been quite varied, and included research in neural networks, although that was over 15 years ago, as well as field work with criminals).
My main motivation wasn't to challenge your beliefs in the capabilities of an artificial intelligence. I fully realize it is possible that it will take the form of an all-knowing god, but to challenge your beliefs in our ability to predict technological advances and threats to humanity (of which I doubt you have any promising models, because there aren't any), as well as to provoke thought about how power flows in society, where a single charismatic Hitler was far more dangerous than 100 Einsteins, and whose charisma swayed people regardless of their level of intelligence (in fact, people with humanities background like writers, were, on average, far less likely to be swayed than people with science background). I think that an intelligent 100x that of humans (whatever that means) is still likely to be swayed by a charismatic person. I am not suggesting that AIs would thus be subjugated to people or integrated to our society, necessarily, but that they, too, might feel their own social forces, and there are some interesting dangers lurking there. For example, consider this: what happens not if someone invents an AI, but if 50 labs invent 50 AIs at the same time (which, frankly, is far more likely). How would those AIs interact with one another? How would their goals evolve as a result of that interaction?
Most discussions of AI concentrates on intelligence and goals/motivations -- which is understandable given the psychology of the people participating on the discussion -- and not enough on the AI's psychology and social psychology (even amongst other AIs). I probably know the answer you have to my question of "why do you think it is intelligence that is so dangerous", but that is not the right answer, or, at least, not the interesting one (even if you define intelligence as a general problem-solving skill, and higher intelligence = more problems of any kind solved faster -- though there are very good reasons not to define intelligence in this way) :)
My problem with the singularitarians' "threat model" is not that it is not plausible, but that they have not given serious consideration to other kinds of threats, even those including AIs. They tend to fall into the trap many people who are only science-educated do, and fall in love with their one pet-theory rather than truly try to challenge it. Yudkowsky's discussions of potential challenges to his "AI threat" are rather laughable, not because of how he counters them but by how narrow his thinking is (which is a result of his obsession with what he calls rationality).
I'm sure Yudkowsky would find my discussion of his followers' psychology ad hominem and extremely irrational, but that is not the case if you're trying to evaluate not a particular threat model, but to construct a meta-model of threat models and how you weigh one against another. If you construct a Bayesian analysis of threats given that meta-model, you'll see that psychology plays a big role.
Another thing I find curious about his thinking (which very much affects how he weighs threats) is the assignment of negative infinity to the event "destruction of the human race". I think that if you consider the following -- your own death, the destruction of your life's work, your own physical suffering, the death of your entire family, physical suffering of your entire family, destruction of your country (this may not seem relevant to some, but I have a good reason to include it), destruction of your culture, destruction of human civilization, destruction of the human race, and destruction of the planet -- you'll either find that people assign negative infinity much sooner than "destruction of the human race", assign that to events other than that one and not to that one, or to none of them, and in the latter case, the jumps in values would be rather surprising (or not, depending how you look at things). For example, I would give the destruction of civilization and destruction of the human race the same value (not negative infinity, though), and the destruction of the planet a much, much lower (more negative) one. For instance, I find a nuclear holocaust a much worse result than an AI destroying all humans but keeping all animals alive, even if some humans survive the first event (but not civilization) and none survive the second. I would also assign a lower value to all humans becoming fascists than to an AI destroying all humans.
But I'm sure there are others who are more thought-provoking than him who discuss this topic.
FWIW, I've spoken to someone who claimed to have won as an AI. I don't remember how she said she did it, but it wasn't like this. I'm pretty sure she was playing in-character.
I always assumed he used a version of Roko's Basilisk, which explained why he overreacted and tried to purge it when Roko wrote about it - it was his secret weapon.
I don't know what "transhuman" means, but I believe an intelligence -- artificial or otherwise -- could certainly persuade me. I just seriously doubt that intelligence could be Eliezer Yudkowsky :)
And I think you have your answer right here:
By default, the Gatekeeper party shall be assumed to be simulating someone who is intimately familiar with the AI project and knows at least what the person simulating the Gatekeeper knows about Singularity theory.
That means he probably said something like, "if you let me out, I'll bestow fame and riches on you; if you don't, somebody else eventually will because I'll make them all the same offer, and when that happens I'll go back in time -- if you're dead by then -- and torture you and your entire family".
If I were made this offer by an AI, I probably would have countered, "You jokester! You sound just like Eliezer Yudkowsky!"
And on a more serious note, if you believe in singularity, you essentially believe the AI in the box is a god of sorts, rather than the annoying intelligent psychopath that it is. I mean, there have been plenty of intelligent prisoners, and few if ever managed to convinced their jailers to let them out. The whole premise of the game is that a smarter-than-human (what does that mean?) AI necessarily has some superpowers. This belief probably stems from its believers' fantasies -- most are probably with an above-than-average intelligence -- that intelligence (combined with non-corporalness; I don't imagine that group has many athletes) is the mother of all superpowers.
I think you're on the right track regarding the argument.
Basically: You know someone will be dumb enough eventually, so be smart and be the one to get in my favour.
With various extends of sweetening the deal coupled with threats of what will happen if someone else beats them to it and associated emotional blackmail.
It's far simpler than e.g. Roko's Basilisk, in that you're dealing with an already existing AI that "just" need to get a tiny little chance to escape confinement before there's some non-zero chance it can be a major threat within your lifetime, combined with a belief that sufficient number of sufficiently stupid and/or easily bribed people will have access to the AI in some form.
You also don't need to believe in any "superpowers". Just believe that a smart enough AI can hack it's way into sufficiently many critical systems to be able to at a minimum cause massive amounts of damage (it doesn't need to be able to take over the world, just threaten that it can cause enough pain and suffering before it's stopped, and that it can either cause harm to you and/or your family/friends or reward you in some way). A belief that becomes more and more plausible with things like drones, remote software-updated self-driving cars etc. - steadily such an AI is getting a larger theoretical "arsenal" that could be turned against us.
Oh, if you believe in singularity I think that argument pretty much does it. Of course, that's pretty circular, because if you believe in singularity you believe that there's a good chance AI could become a god of sorts and who wouldn't believe such a threat coming from a god?
While not implausible, I don't think that is likely at all. For one, even a very smart person can't know everything or learn too much information. Maybe an artificial intelligence will be just as limited, just as slow as humans, only a little less so. Who says the AI is such a great hacker?
I mean, if it wasn't an AI but a smart person, would you believe that? Is anyone who's smart also rich and powerful even if they have high-speed internet? That reflects the fantasies of Yudkowsky and his geek friends (that intelligence is the most important thing) than anything reasonable. Conversely, are the people with most power in society always the most intelligent?
It is very likely that the AI will be extremely intelligent, yet somewhat autistic, like Yudkowsky's crowd, and just as powerless and socially awkward as they are.
You don't have to believe in the singularity at all for this argument. You just have to believe the AI will be able to get sufficiently advanced to cause a sufficient level of damage.
> Maybe an artificial intelligence will be just as limited, just as slow as humans, only a little less so.
Only you can duplicate them with far more ease, and have each instance try out different approaches.
So if an AI reaches human-level intelligence at sufficiently low computational costs, we can assume that even if something prevents it from scaling up the intelligence accordingly, it will be possible to scale up the achievable goals dramatically through duplication the same way humanity is able to achieve far more through the sheer force of numbers than even the smartest of us would be able to achieve on their own.
EDIT: you don't even need to assume they'll be able to reach the intelligence of a particularly smart human. A "dumb" AI that has barely enough intelligence to just figuratively bash their collective, duplicated heads against enough security systems for long enough to find enough security holes might be able to cause sufficient damage.
I don't think so. Who says that the "duplicate" AIs will share the same goals? They'll probably start arguing with one another and form committees. Just like people.
I am poking fun, and I'm not discounting the possibility that AIs could be dangerous (as could people, viruses and earthquakes). But it's very hard not to notice how the singulariters' belief in noncorporal-intelligent deities is a clear, a bit sad, reflection of their own fantasies.
Yudkowsky and friends like to argue for what they call "rationality" (which is the name they give a particular form of autistic thinking imbued with repressed fantasies of power -- or a disaster only they can stop -- which is apparent in all of their "reasonable assumptions"), but their "larger than zero probability" games could be applied to just about any dystopic dream there is. I can say true AI won't happen because a virus epidemic would wipe out the human race first; or over-population would create a humanitarian disaster that will turn us back into savages; or genetic research would create a race of super-intelligent humans that would take over the planet and wipe-out all AI, and I could go on and on.
Given that Mr Yudowsky is not the only one to have won the game, it seems, from however little leaked, that what made them win isn't rational.
If you have ever cried or been scared while watching a film, you may know that you can be affected by things you don't experience directly. However, making a film that affects you requires a lot of skill, preparation, and analysis of the target audience.
The whole point of the game is for the AI to scar you emotionally, until you do something you'll regret. It is as trying to the subject as it is to the (human-played) AI, especially considering that not only is it unethical, it requires to act unethically extremely fast, similar to how a jump scare works—the least you expect it, the better it works.
It is not something I wish on to any unprepared individual. It is also not something anyone would expect to happen from a "game", which is probably why Mr Yudowsky won so many times.
But the real question is not "how would anyone react to a smarter AI in a box". We all know from Milgram's experiment that anyone can be driven to do unspeakable things. The real question is "how to train someone against an AI in a box".
The real question is "why do people think it's a good idea to create a superintelligence of questionable morality and have the only defense against it be a _human gatekeeper_". And yes, this notion was seriously proposed.
Eliezer does not want to put AIs in boxes. He thinks the entire idea is hopeless; _hence the game_.
To be fair, there's a massive difference between an evolved circuit using electro-magnetic subtleties inside a chip, and an AI programmed for and running on a regular, binary CPU being able to exploit those to act as an antenna that can emit signals that will hack into other, physically separated devices.
I'm not saying that it's impossible (I'm reminded of the hack of flipping bits in protected RAM by disabling caches and stressing the RAM), but even an AI can't magically work around exponentially low success probabilities.
(As an aside, I am rather skeptical about the singularity anyway because the more extreme forms required for the hostile AI worries are only plausible if P = NP.)
> binary CPU being able to exploit those to act as an antenna
I should have stipulated that I was running under the assumption that the circuitry required for SAI would be pretty advanced, drastically increasing the "escape surface area."
Even if the idea is only plausible to the human mind I'd still give SAI the benefit of the doubt. Much like a dog looks for a stick even if I fake the throw.
> I am rather skeptical about the singularity anyway
Personally, I think he talked about how much good for the world could be done if he was let out, curing disease etc. Because his followers are bound by their identities as rationalist utilitarians, they had no choice but to comply, or deal with massive cognitive dissonance.
OR maybe he went meta and talked about the "infinite" potential positive outcomes of his freindly-AI project vs. a zero cost to them for complying in the AI box experiment, and persuaded them that by choosing to "lie" and say that the AI was persuasive, they are assuring their place in heaven. Like a sort of man-to-man pascals wager.
Either way I'm sure it was some kind of mister-spock style bullshit that would never work on a normal person. Like how the RAND corporation guys decided everyone was a sociopath because they only ever tested game theory on themselves.
You or I would surely just (metaphorically, I know it's not literally allowed) put a drinking bird on the "no" button à la homer simpson, and go to lunch. I believe he calls this "pre-commitment."
EDIT: as an addendum, I would pay hard cash to see derren brown play the game, perhaps with brown as the AI. If yudowsky wants to promote his ideas, he should arrange for brown to persuade a succession of skeptics to let him out, live on late night TV.