> *Even* if *the Unicode codespace were to ever be extended again (it won't), th... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

chrismorgan on June 12, 2023 | parent | context | favorite | on: UTF-21, a toy character encoding

> Even if the Unicode codespace were to ever be extended again (it won't), the only encoding that would become incompatible is UTF-16. In fact, both UTF-8 and UTF-32 are trivially extensible and used to be wider encodings, but were restricted to 0x10FFFF arbitrarily to match UTF-16 limitations.

Mind you, it wouldn’t be this easy, because things should perform Unicode validation, and many do, so every piece of software would need to be updated for the new, enlarged version of Unicode, UTF-8 and UTF-32, and old software that validated would baulk or convert anything from the new ranges into REPLACEMENT CHARACTER.

But yeah, UTF-16 would be toast.

e4m2 on June 12, 2023 [–]

True, another extension would be disastrous, we may never recover from the fallout of malformed UCS2-esque UTF-16. Let's just hope no one fixes all the broken, accidentally forward-compatible decoders in the practically infinite amount of time it will take to completely fill the Unicode codespace.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact