Again, my original parent's statement was not about encoding or memory savings. The statement was that it was a bad idea to index into an (abstract) unicode string (of unicode code points -- not compositions thereof whatsoever).
I didn't question that, but hoped to get some inspiration for sane usage of unicode handling (which I'm not sure is humanly possible except for treating it as a rather black box and make no promises).
Your original parent was all about encodings, and mentioned it was a bad idea to arbitrarily index in to utf8 strings, (no mention of abstract strings of unicode codepoints).
> languages such as Rust gain efficiency by working with unmodified UTF-8. All you lose is constant-time arbitrary indexing
So it's saying Rust mostly benefits from using utf8, but in doing so, it loses the ability to arbitrarily index a character in a string (in constant time).
If it was abstract strings of unicode codepoints then there is no problem - except you'd then be using 32bits per codepoint.
I didn't question that, but hoped to get some inspiration for sane usage of unicode handling (which I'm not sure is humanly possible except for treating it as a rather black box and make no promises).