I think the bigger problem is that if the dataset was sufficiently poisoned, LLMs could start producing Greek question marks in their output. Like if you could tie it to some rare trigger words you could then use those words to cause generated code not to compile (despite passing visual inspection).
I mean how could YOU possibly know if it's really a Greek question mark... context. LLM's are a bit more clever than you're giving them credit for.