Hacker News new | past | comments | ask | show | jobs | submit login

Why would that matter, if you only want some well-defined order?



Because with (Unicode) strings, "\u006e\u0303" is defined to be equal to "\u00f1", for example. If you'd do bytewise comparison, as the above comment suggested, you may not reach the same result ¯\_(ツ)_/¯


Whether those two strings are or are not equivalent depends on the context. If we're assuming (as the GP did) a very generic context where we simply want to store arbitrary strings in a sorted data structure, then there is no reason to assume they are supposed to be interpreted as Unicode.

For a simple example, perhaps this is a list of strings that require Unicode normalization to be properly interpreted as human text that you are storing into a TreeMap for efficient retrieval. When you are adding "\u00f1" to the list, you wouldn't want the collection to say that it's already there because it already had "\u006e\u0303".




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: