Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A bug that spoke Russian and crashed my app (imzaldih.com)
64 points by zaldih on May 6, 2024 | hide | past | favorite | 21 comments


This isn't fixed properly

The linked code is:

        String languageCode = Locale.getDefault().getDisplayLanguage();
        Locale locale = new Locale(languageCode);
        Locale.setDefault(locale);
But `getDisplayLanguage()` is defined as returning the display name, not the language code:

> if the locale is en_US and the default DISPLAY locale is fr_FR, getDisplayLanguage() will return "anglais"

https://developer.android.com/reference/java/util/Locale#get...()

----

This then proceeds to create a locale with a corrupt language code. The language parameter of `Locale` is not validated (besides a null check), and using a corrupt Locale in `Locale.setDefault()` will cause a number of bugs


I agree. Locale.getLanguage() returns the language code.

The specific Locale() constructor used in the code is documented as taking a parameter: "An ISO 639 alpha-2 or alpha-3 language code, or a language subtag up to 8 characters in length. See the Locale class description about valid language values."

I'd guess the app code lucked out on the string returned from getDisplayLanguage() getting mapped to the appropriate language in all the other cases that the app supports.

Now if getLanguage() had been renamed to getLanguageCode() then maybe a dev might have had the light bulb turn on, but random string returned from an API passed to another API passes the compile test and the app doesn't blow up, so we're good to go...

Searching for the offending incorrect code on the web doesn't return any matching "sample" code, so I'm not sure where the app got the bug from.


Would stronger typing help here?

If I were to design those APIs, I'd maybe suggest a more opaque type called "LanguageCode" that is NOT a String and could not be returned from or passed as a String. Then getLanguage() could return a LanguageCode type, and the Locale() constructor could take the LanguageCode type as its parameter, and then tooling could enforce that you can't mix LanguageCode with String.

Then, calling Locale's constructor with the return value from getDisplayLanguage() would produce an easy-to-find error.


The API is the problem here. The method names in this case: getLanguageCode() and getLanguageName(), or getLanguageDisplayName() even, are much better names.


Yep, having proper types instead of passing around strings would have prevented this.


>Would stronger typing help here?

While we're not programmers, we can say with certainty that hitting the keys harder, does NOT change the behavior.

-Ops Team.


For me the light bulb here is the word `display`. The method is called `getDisplayLanguage` because it's returning the language name for display to the user. Even if I didn't already know that, I'd be asking myself as I was typing the code: "wait, what does 'display' mean here? Why would they name it that? I'd better check the docs..."


Thank you very much for your comments guys! I am definitely learning a lot.

I was encouraged to write the post just in case someone happened to have the same problem, hopefully ended up on the site and could find some light here.

Now thanks to you I realise that I made the mistake of using getDisplayLanguage() instead of getLanguage(). The former returns the text of the language. The second one returns the universal code of the language.

As I said at the end, I'm not an Android developer and I guess in my haste and tiredness I settled for the first thing that seemed to work without much thought to good practice or whether I was actually getting the language code or not. A very silly bug but as it worked most of the time for some reason, it was almost impossible to find the first time. And of course, snipped was a simplification of a more complex system with persistence handling and so on, which made it not so obvious.

I certainly think all of us here won't make this mistake in the future :)


Kudos. We’ve all solved bugs with the wrong code. It was an interesting post and I found the unexpected secondary followup just as interesting. (Why did it work for, eg, Chinese, do you know?)

See, I am not an Android developer either. I could have made the same mistake. To me it comes down to — why string types?


This is always the issue. It’s why Python and JavaScript are so terrible, and the cause of so many bugs, yet so readily available in apis and cicd pipelines.

Why people insist on keep using strings instead of typesafe variants is beyond me


TypeScript is nice here because you can specify the specific strings you accept in a union. Eg. `unit: 'AC'|'DC'`


> If the device is in Russian, Locale.getDefault() returns “русский”, and it seems that SQLite doesn’t like this for some reason.

> I did more tests, and it seems that out of the over 100 languages Android supports, only Russian encoding broke SQLite. Not even Chinese, which we typically think of as the most complicated (and also I tested changing to this language much earlier before trying the other).

Is this an issue?


According to some people on the Internet [0][1], it is. Apparently, Java's (Android's?) Locale.getLanguage() is broken for Russian locale and instead of returning "ru" as the docs promise [2], it returns "русский".

[0] https://stackoverflow.com/questions/46916210/cannot-get-writ...

[1] https://github.com/google/ExoPlayer/issues/8251

[2] https://docs.oracle.com/javase/8/docs/api/java/util/Locale.h...


I checked your links. Java and Android are not broken. SO and Github examples just do it wrong, as can be seen in javadoc [2]. You are not supposed to invoke constructor with localized language name. It’s a bit silly to assume that it could work.


Except that TFA is using `getDisplayLanguage`

    public void updateAppLanguage(Context context) {
        String languageCode = Locale.getDefault().getDisplayLanguage();
        Locale locale = new Locale(languageCode);
        Locale.setDefault(locale);
... which the JDK docs say:

>Returns a name for the locale's language that is appropriate for display to the user.

So maybe `getLanguage` is broken or maybe it isn't, but it seems wrong to use `getDisplayLanguage` for this purpose regardless.

(What even is the point of doing `Locale.setDefault()` with the same Locale that `Locale.getDefault()` returned? I don't know anything about Android.)


For language:

1. `getLanguage` should return the 2-letter ISO 3166-1 language code, or an empty string if missing. -- e.g. en, ru, fr, etc.

2. `getISO3Language` should return the 3-letter ISO 3166-1 language code, or an empty string if missing. -- e.g. eng, rus, fra, etc.

1. `getDisplayLanguage` should return the language display name in the current locale, or an empty string if missing. -- e.g. English, Russian, French, etc.


Am I correct in understanding that `getDisplayLanguage` should return "Russian" if the current locale is US English but will return "Russe" if set to French?

Btw you have a formatting error. The third numbered point is listed as another '1.'.


Yes, it would return "Russe" when the current locale is French. -- It is designed for displaying the locale to the user on a UI, console output, etc.


> (What even is the point of doing `Locale.setDefault()` with the same Locale that `Locale.getDefault()` returned? I don't know anything about Android.)

The code intends to trim the country, variant and any extensions from the value returned from `Locale.getDefault()`

Example: "Chinese (Traditional) - Taiwan" to "Chinese"

Pseudocode:

`zh_Hant_TW` -> `zh`

It also normalizes to fix some Android internal mapping issues due to changes in ISO 639 (Hebrew can be `iw` or `he[b]`)


Does it make sense though to do it this way? I18n framework that loads resources should support fallback internally in case there’s no direct match.


Limiting, but it looks like a valid solution. Android's native i18n is a mess.

We went down the route of duplicating strings between `/values-heb` and `/values-iw` to fix the platform issues, even though the change from `iw` -> `heb` was enacted in ISO 639:1989.[0]

OP's change is questionable in that it may block regional dialects: `[zh-CN, zh-TW]; [pt-PT, pt-BR]` in case the system default is set incorrectly, but it will suffice for most Android apps which will realistically only be in a small subset of 'easy' languages.

[0] https://xml.coverpages.org/iso639a.html (revised 1989)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: