I think this is totally what happens. It is trained to produce the next most statistically likely word based on the expectations of the audience. If the audience assumes it is an evil AI, it will use that persona for generating next words.
Treating the AI like a good person will get more ethical outcomes than treating it like a lying AI. A good person is more likely to produce ethical responses.
Treating the AI like a good person will get more ethical outcomes than treating it like a lying AI. A good person is more likely to produce ethical responses.