> "Most of these AGI doom-scenarios require no self-awareness at all. AGI is just an insanely powerful tool that we currently wouldn't know how to direct, control or stop if we actually had access to it."
You're talking about "doomsday scenarios". Can you actually provide a few concrete examples?
Over the course of years, we figure out how to create AI systems that are more and more useful, to the point where they can be run autonomously and with very little supervision produce economic output that eclipses that of the most capable humans in the world. With generality, this obviously includes the ability to maintain and engineer similar systems, so human supervision of the systems themselves can become redundant.
This technology is obviously so economically powerful that incentives ensure it's very widely deployed, and very vigorously engineered for further capabilities.
The problem is that we don't yet understand how to control a system like this to ensure that it always does things humans want, and that it never does something humans absolutely don't want. This is the crux of the issue.
Perverse instantiation of AI systems was accidentally demonstrated in the lab decades ago, so an existence proof of such potential for accident already exists. Some mathematical function is used to decide what the AI will do, but the AI ends up maximizing this function in a way that its creators hadn't intended. There is a multitude of problems regarding this that we haven't made much progress on yet, and the level of capabilities and control of these systems appear to be unrelated.
A catastrophic accident with such a system could e.g. be that it optimizes for an instrumental goal, such as survival or access to raw materials or energy, and turns out to have an ultimate interpretation of its goal that does not take human wishes into account.
That's a nice way of saying that we have created a self-sustaining and self-propagating life-form more powerful than we are, which is now competing with us. It may perfectly well understand what humans want, but it turns out to want something different -- initially guided by some human objective, but ultimately different enough that it's a moot point. Maybe creating really good immersive games, figuring out the laws of physics or whatever. The details don't matter.
The result would at best be that we now have the agency of a tribe of gorillas living next to a human plantation development, and at worst that we have the agency analogous to that of a toxic mold infection in a million-dollar home. Regardless, such a catastrophe would permanently put an end to what humans wish to do in the world.
By the time you find such evidence, it could already be close to game over for humanity. It’s important to get this right before that.
We already have significant warnings. See for yourself if latest models like Imagen, Gato, Chinchilla have economic values and can potentially cause harm.
Historical examples of perverse instantiation are everywhere: Evolutionary agents learning to live off a diet of their own children, machine learning algorithms attempting to learn gripping a ball cheating the system by performing ball-less movements that the camera erroneously classifies as successful, an evolutionary algorithm to optimize the number of circuit elements in a timer creating a timer circuit by picking up an external radio signal unrelated to the task and so on. Some examples are summarized here: https://www.wired.com/story/when-bots-teach-themselves-to-ch...
GP wanted a concrete example of a doomsday scenario of failed AI alignment, so in that context extrapolating to a plausible future of advanced AI agents should suffice. If you need a double-blind peer reviewed study to consider the possibility that intelligent agents more capable than humans could exist in physical reality, I don't think you're in the target audience for the discussion. A little bit of philosophical affinity beyond the status quo is table stakes.
You're talking about "doomsday scenarios". Can you actually provide a few concrete examples?