Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> 16-bit characters... work fine in 99% of the cases.

In other words, they don't work :-).

UTF-16 is also variable-length. Sometimes a character fits in 16 bits, and sometimes it doesn't. From a practical view it's worse than UTF-8, because tests are less likely to detect bugs before shipping.

Even UTF-32 is, in reality, variable-length. Many code points are combining characters, so you need multiple code points to get a single grapheme.

If your language or API requires you to do something, then you'll need to do that. But unless there's an API requirement, in most situations UTF-8 is the best choice for network, storage, and processing. There are exceptions, but they're just that... exceptions.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: