Do we not consider emoji a language unto itself? I was actually quite bemused to...

tsimionescu · on April 15, 2021

Even with proper UTF-8, you'll get situations where you can insert a space in the middle of an emoji and split it into other emoji + unprintable characters. The encoding is irrelevant, you need proper Unicode support to avoid these problems.

account42 · on April 15, 2021

Unfortunately proper unicode support for many operations means carrying around tons of data since important properties of code points cannot be derived from the codepoints themselves.

tsimionescu · on April 15, 2021

Exactly. UTF-8 doesn't and cant fix this problem, you need a full Unicode library if you want to correctly handle human text. If you don't, why bother with UTF-8 instead of something simpler, like ASCII?

account42 · on April 16, 2021

> If you don't, why bother with UTF-8 instead of something simpler, like ASCII?

Because unicode support isn't binary. Being able to pass along and not mange blobs of unicode is already a lot better than ASCII-only.