In ASCII, the concept of a character is overloaded with many properties. Unicode... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

nayuki on June 12, 2023 | parent | context | favorite | on: UTF-21, a toy character encoding

In ASCII, the concept of a character is overloaded with many properties. Unicode breaks "character" out to separate concepts like code unit, code point, grapheme cluster, etc.

Because of this, some of your statements are invalid. For example, "Along comes unicode which has variable bytes per character. (Yes, even for utf-32, which is why no-one uses utf-32).".

UTF-32 has a fixed number of bytes per code point. It doesn't have a fixed number of bytes per grapheme cluster.

bruce511 on June 13, 2023 [–]

Correct. I used character where you used grapheme cluster. Utf-32 is variable length at the character (aka grapheme cluster) level.

If there are any other invalid parts please elaborate - I'm always keen to learn (seriously).

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact