Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Locales are so much more than character sets. E.g. an Arabic locale changes the direction of writing, it also changes the characters used for numbers, and completely changes the way numbers and dates are formatted. This is where the C locale functions are problematic.

Character encoding is the easy and safe part.




Locales are much more than character sets, but the question was about character sets.

Also for most of those things, you want to be explicit about when to use the locale and when to not.


> Also for most of those things, you want to be explicit about when to use the locale and when to not.

Right. And that's where the POSIX C API falls down. The locale isn't named explicitly. Its not a function parameter. Its specified via a global variable that gets shared between all your threads.

You might think you can use scanf to parse a string in a JSON file. It might appear to work fine on your local computer. But scanf behaves differently depending on the system locale. You can wrap scanf with a helper function which sets the locale to something sensible, calls scanf, and restores the locale. But because the locale is shared with other threads, which might be depending on the locale in other ways. So this can introduce race conditions.

The whole thing is horribly designed - and it leads to buggy, unreliable code that is hard to reason about. Even in the best case, introducing thread syncronization into a function like sscanf will lead to a dramatic decrease in performance.

Its horrible. Just horrible.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: