>In my mind, it happens when autonomous systems optimizing reward functions to "...

>In my mind, it happens when autonomous systems optimizing reward functions to "stay alive" (by ordering fuel, making payments, investments etc) fail because of problems described above in (a) -- the inability to have deterministic rules baked into them to avoid global fail states in order to achieve local success states. (Eg, autonomous power plant increases output to solve for energy needs -> autonomous dam messes up something structural -> cascade effect into large swathes of arable land and homes destroyed).

And for this to develop in machines, machines would have to be subject to many mistakes along the way leading to all kinds of outcomes that we hold humans accountable for by fining them, sending them to jail, some of them dying etc. I think that would be so wholly unpalatable to man kind they'd cut that experiment short before it ever reached any sort of scale.

I agree with your conclusion that enough of the rules can't be encoded by us as we don't even know them and for machines to acquire them the traditional way is, I believe, fundamentally disagreeable to humans.