Hacker News new | past | comments | ask | show | jobs | submit login

> Although the standard does state that Strings with textual data are supposed to be UTF-16.

No, it doesn't. It states that they're UTF-16 code units, a term defined in Unicode (see D77; essentially an unsigned 16-bit integer), which is not the same as UTF-16. A sequence of 16-bit code units can therefore include lone surrogates, which something encoded in UTF-16 could not.




Oh, yes; I just skimmed 'code unit' bit without actually reading. (I've now removed the misinformation from my previous comment.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: