What the parent is saying is that an AI (that is, AGI as that is what we are dis...

ben_w · on May 13, 2022

> The obvious solution is thus to achieve some goal with some human constraint.

One of the hard parts is specifying that goal. This is the “outer alignment problem”.

Paperclips per human? That’s maximised by one paperclip divided by zero humans, or by a universe of paperclips divided by one human if NaN doesn’t give a better reward in the physical implementation.

If you went for “satisfied paperclip customers”? Then wirehead or drug the customers.

Then you have the inner alignment problem. There are instrumental goals, things which are useful sub-steps to larger goals. AI can and do choose those, as do us humans, e.g. “I want to have a family” which has a subgoal of “I want a partner” which in turn has a subgoal of “good personal hygiene”. An AI might be given the goal of “safely maximise paperclips” and determine the best way of doing that is to have a subgoal of “build a factory” and a sub-sub-goal of “get ten million dollars funding”.

But it’s worse than that, because even if we give a good goal to the system as a whole, as the system is creating inner sub-goals, there’s a step where the AI itself can badly specify the sub-goal and optimise for the wrong thing(s) by the standards of the real goal that we gave the system as a whole. For example, evolution gave us the desire to have sex as a way to implement its “goal” (please excuse the anthropomorphisation) of maximising reproductive fitness, and we invented contraceptives. An AI might decide the best way to get the money to build the factory is to start a pyramid scheme.

Also, it turns out that power is a subgoal of a lot of other real goals, so it’s reasonable to expect a competent optimiser to seek power regardless of what end goal we give it.

Robert Miles explains it better than I can: https://youtu.be/bJLcIBixGj8

sdenton4 · on May 13, 2022

> maximize paperclips per human

Kill all humans, make one paperclip, declare victory.