each time a new unicode version is introduced, you get new backward compatibilit...

dhosek · on Oct 24, 2022

In fact, no. Unicode is rigorously backwards compatible. When Unicode 15.0.0 was released this year, the only thing I needed to do with my Unicode library was update the data tables that indicated the categories and combining classes for the newly added characters. Once a character is added, it’s there forever. This is part of why, for example, languages written in different descendants of the Brahmic script treat vowels differently, because they meant to preserve round-trip compatibility with pre-Unicode character conventions so in Thai, most vowels are treated as separate graphemes from the consonant to which they’re attached while in Devanagari, the corresponding vowels are treated as combining (and spacing) diacritics. The one place where Unicode chose to break backwards compatibility with pre-Unicode was its most controversial choice, Han unification, where the various incompatible 16-bit character encodings of Han characters (Japanese, Korean, and THREE Chinese encodings) were replaced with a single unified set that eliminated the duplications between the three sets. But within the Unicode history, I think there was one breaking change in the 90s that was fixing an error (I don’t care to dig up the history for a HN comment), but otherwise, any text encoded with a Unicode version prior to 15.0.0 will be interpreted identically in the current Unicode.

(I had someone ask for the possibility of being able to choose older versions of Unicode in my library to handle his use case with clusters in a terminal app, but on further investigation to what he was trying to do, I discovered that it was a misunderstanding about how grapheme clusters work and in fact would not do what he wanted it to do.)