I was actually quite bemused to discover that some code review software I was using allowed me to "cursor" halfway through a smiley face emoji and enter a space (typing too fast to pay attention)... causing the infamous "box characters" because I'd accidentally split the smiley down the middle.
I get the need for extreme backward compat in browsers, but... this seems like one of those things that just might be worth fixing. Maybe a "use utf8" directive? :)
Even with proper UTF-8, you'll get situations where you can insert a space in the middle of an emoji and split it into other emoji + unprintable characters. The encoding is irrelevant, you need proper Unicode support to avoid these problems.
Unfortunately proper unicode support for many operations means carrying around tons of data since important properties of code points cannot be derived from the codepoints themselves.
Exactly. UTF-8 doesn't and cant fix this problem, you need a full Unicode library if you want to correctly handle human text. If you don't, why bother with UTF-8 instead of something simpler, like ASCII?
I was actually quite bemused to discover that some code review software I was using allowed me to "cursor" halfway through a smiley face emoji and enter a space (typing too fast to pay attention)... causing the infamous "box characters" because I'd accidentally split the smiley down the middle.
I get the need for extreme backward compat in browsers, but... this seems like one of those things that just might be worth fixing. Maybe a "use utf8" directive? :)