Hacker News new | past | comments | ask | show | jobs | submit login

In ASCII, the concept of a character is overloaded with many properties. Unicode breaks "character" out to separate concepts like code unit, code point, grapheme cluster, etc.

Because of this, some of your statements are invalid. For example, "Along comes unicode which has variable bytes per character. (Yes, even for utf-32, which is why no-one uses utf-32).".

UTF-32 has a fixed number of bytes per code point. It doesn't have a fixed number of bytes per grapheme cluster.




Correct. I used character where you used grapheme cluster. Utf-32 is variable length at the character (aka grapheme cluster) level.

If there are any other invalid parts please elaborate - I'm always keen to learn (seriously).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: