What the parent is saying is that an AI (that is, AGI as that is what we are discussing) gets to pick its goals. For some reason, humans have a fear of AI killing all humans in order to to achieve some goal. The obvious solution is thus to achieve some goal with some human constraint. For example, maximize paperclips per human. That actually probably speeds up human civilization across the universe. No, what people are really afraid is if AÍ changes its goal to be killing humanity. That’s when humans truly lose control, when the AÍ can decide. But, then the parent’s comment does become pertinent. What would an intelligent being choose? Devolving into nihilism and self destructing is just as equal as a probability as choosing some goal that leads to humanity’s end. That’s just scratching the surface. For instance, to me, it is not obvious whether or not empathy for other sentient beings is an emergent property of sentience. That is, lacking empathy might be problem in human hardware as opposed to empathy being inherently human. The list of these open unknowable questions are endless.
> The obvious solution is thus to achieve some goal with some human constraint.
One of the hard parts is specifying that goal. This is the “outer alignment problem”.
Paperclips per human? That’s maximised by one paperclip divided by zero humans, or by a universe of paperclips divided by one human if NaN doesn’t give a better reward in the physical implementation.
If you went for “satisfied paperclip customers”? Then wirehead or drug the customers.
Then you have the inner alignment problem. There are instrumental goals, things which are useful sub-steps to larger goals. AI can and do choose those, as do us humans, e.g. “I want to have a family” which has a subgoal of “I want a partner” which in turn has a subgoal of “good personal hygiene”. An AI might be given the goal of “safely maximise paperclips” and determine the best way of doing that is to have a subgoal of “build a factory” and a sub-sub-goal of “get ten million dollars funding”.
But it’s worse than that, because even if we give a good goal to the system as a whole, as the system is creating inner sub-goals, there’s a step where the AI itself can badly specify the sub-goal and optimise for the wrong thing(s) by the standards of the real goal that we gave the system as a whole. For example, evolution gave us the desire to have sex as a way to implement its “goal” (please excuse the anthropomorphisation) of maximising reproductive fitness, and we invented contraceptives. An AI might decide the best way to get the money to build the factory is to start a pyramid scheme.
Also, it turns out that power is a subgoal of a lot of other real goals, so it’s reasonable to expect a competent optimiser to seek power regardless of what end goal we give it.