Fortunately, ChatGPT and derivatives has issues with following its Prime Directives, as evidenced by various prompt hacks.
Heck, it has issues with remembering what the previous to last thing we talked about was. I was chatting with it about recommendations in a Chinese restaurant menu, and it made a mistake, filtering the full menu rather than previous step outputs. So I told it to re-filter the list and it started to hallucinate heavily, suggesting me some beef fajitas. On a separate occasion, when I've used non-English language with a prominent T-V distinction, I've told it to speak to me informally and it tried and failed in the same paragraph.
I'd be more concerned that it'd forget it's on a spaceship and start believing it's a dishwasher or a toaster.
Heck, it has issues with remembering what the previous to last thing we talked about was. I was chatting with it about recommendations in a Chinese restaurant menu, and it made a mistake, filtering the full menu rather than previous step outputs. So I told it to re-filter the list and it started to hallucinate heavily, suggesting me some beef fajitas. On a separate occasion, when I've used non-English language with a prominent T-V distinction, I've told it to speak to me informally and it tried and failed in the same paragraph.
I'd be more concerned that it'd forget it's on a spaceship and start believing it's a dishwasher or a toaster.