I kind of wonder how far down the rabbit hole they went here.
Eg one of the standard preoccupations in this kind of situation is that the AI will be able to guess that it's being studied in a controlled environment, and deliberately "play dumb" so that it's given access to more resources in a future iteration.
Now, I don't think this is something you'd realistically have to worry about from GPT-4-simulating-an-agent, but I wonder how paranoid the ARC team was.
Honestly, it's already surprisingly prudent of OpenAI to even bother testing this scenario.
the ARC team can be manipulated I'd reckon through an adversarial AI. I used to think these controversy tinfoil theories, but then I see the devolution of someone like a Elon Musk in real time.
Eg one of the standard preoccupations in this kind of situation is that the AI will be able to guess that it's being studied in a controlled environment, and deliberately "play dumb" so that it's given access to more resources in a future iteration.
Now, I don't think this is something you'd realistically have to worry about from GPT-4-simulating-an-agent, but I wonder how paranoid the ARC team was.
Honestly, it's already surprisingly prudent of OpenAI to even bother testing this scenario.