Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do we not consider emoji a language unto itself?

I was actually quite bemused to discover that some code review software I was using allowed me to "cursor" halfway through a smiley face emoji and enter a space (typing too fast to pay attention)... causing the infamous "box characters" because I'd accidentally split the smiley down the middle.

I get the need for extreme backward compat in browsers, but... this seems like one of those things that just might be worth fixing. Maybe a "use utf8" directive? :)



Even with proper UTF-8, you'll get situations where you can insert a space in the middle of an emoji and split it into other emoji + unprintable characters. The encoding is irrelevant, you need proper Unicode support to avoid these problems.


Unfortunately proper unicode support for many operations means carrying around tons of data since important properties of code points cannot be derived from the codepoints themselves.


Exactly. UTF-8 doesn't and cant fix this problem, you need a full Unicode library if you want to correctly handle human text. If you don't, why bother with UTF-8 instead of something simpler, like ASCII?


> If you don't, why bother with UTF-8 instead of something simpler, like ASCII?

Because unicode support isn't binary. Being able to pass along and not mange blobs of unicode is already a lot better than ASCII-only.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: