I do find this behaviour rather incomprehensible actually. I understand that if the "utf8" is BMP it can't have invalid characters in it. What puzzles me is that the approach they went for seems to be almost the worst possible one from a security and data integrity perspective.
Assuming a SQL error can't be thrown for some legacy reason (and you'd think a flag to opt into that would be an option), why not replace the invalid parse with U+FFFD like most things do? Or at a bare minimum, only strip the invalid character which at least makes attacks a bit harder and results in a little less data loss.
Just wanted to reply to myself that I just learned from evanelias' comment a "strict mode" flag is indeed an option. I suppose it's time to audit all my mysqls to see which ones have this enabled, and what other mysql-isms this flag might break.
I know it's definitely not enabled on one updated and maintained BSD - this could be because they didn't want to break existing systems during upgrade.
I suspect this problem is still extremely common. But good to know there's some form of defense.
Assuming a SQL error can't be thrown for some legacy reason (and you'd think a flag to opt into that would be an option), why not replace the invalid parse with U+FFFD like most things do? Or at a bare minimum, only strip the invalid character which at least makes attacks a bit harder and results in a little less data loss.