Hacker News new | past | comments | ask | show | jobs | submit login

There are, by definition, no Unicode characters that don't fit in UTF-16.

UTF-16 has surrogate pairs; it's an extension of UCS-2, which doesn't.

Incidentally, this is why UTF-16 is a poor choice for a character encoding: you take twice the memory but you don't actually get O(1) indexing, you only think you do, and then your program breaks badly when someone inputs a surrogate pair.

See also elsewhere in the thread: https://news.ycombinator.com/item?id=8066284




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: