Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Does Your Code Pass the Turkey Test? (2008) (moserware.com)
132 points by hosteur 10 months ago | hide | past | favorite | 120 comments


There's one more test: the Thailand test. I've been bitten by issues in Java where Calendar.getInstance() will return a Thai Buddhist calendar, which is exactly the same as the Gregorian calendar except 543 years in the future.


Exists in iOS too - once built an iPad black car dispatch app where one driver was constantly having issues. After weeks of remote debugging the car company owner just asked the driver to come in, as soon as he told me it was a Buddhist calendar I had it fixed. Now I always triple check my assumptions.


Wouldn’t issues in Java be failing “The Indonesian Test?”


Why are you using the Calendar API in Java? It is full of issues like this.


I never really understood why not simply use milliseconds since the epoch and keep track of a time zone separately if needed... This allows fairly easy conversion to most native systems i have worked with, easy sorting and diff operations as well as final display control. Also don't have a need to be thinking about daylight savings then, or do other conversions for say timesheets.


It depends on what you need to do. If you just need to say when something happened (e.g. some log event) then milliseconds since the epoch is fine. If want a user to schedule a reoccurring meeting at 8am on the first Thursday of every month, you can also keep a set of timestamps as milliseconds since the epoch, but you are going to need to do some work to figure out the right numbers correctly.


> If want a user to schedule a reoccurring meeting at 8am on the first Thursday of every month, you can also keep a set of timestamps as milliseconds since the epoch

No, you can't really, especially if the recurring meetings go on more than a few years into the future. If the time zone rules themselves change (e.g. if the user's country abolishes or introduces DST), then the timestamps you stored will become wrong in light of the change.


Timezone rules change. If you want to be fully robust, you need to store the intent of the time


Because a calendar date and a point in time are entirely different things.


I didn't write the code originally; this was like 15 years ago. If the API still exists people will misuse it.


I don’t know anything about calendar pitfalls or Java, but I would have loved to see a better alternative suggested here.


There’s another fun thing we can call the Britain test.

The short form for September is Sept in en-GB, the only month abbreviated with four letters. It’s Sep in en-US and other en- locales. All other parsing is identical.

For example, if you’re parsing abbreviated date formats on AWS, this parsing fails only on eu-west-1 and -2 servers, and only in September.


Wow that is painful. I think locales are generally net negative in such use cases


A lot of these seem to be the "not America" test.


Something that drives me crazy about the big public clouds is that everything is US regional defaults with - if you’re lucky - UTC as the time zone.

We’ve spent easily several months of engineering time “fixing” apps that kept picking up these defaults directly or indirectly. E.g.: What regional options does Azure Data Factory use for parsing CSV by default for batch jobs if you deploy an instance in Australia East? Your guess is as good as mine!

It’s especially hilarious considering that all three major operating systems ask these questions up front, just as they as for a password instead of providing a result. Vendors have had to learn the hard way that defaults for some settings are nonsense for 96% of the planet, and now the same companies have forgotten this lesson and will have to learn it all over again.


UTC timestamps, but displayed on a 12-hour clock, are among the most cursed things I have to work with every day. Always trips me up, and much more so than "fully US" conventions (i.e. local time and MM/DD/YYYY and am/pm timestamps).

Sometimes (but not always), it helps to change the locale to en_UK. But then spellcheck becomes confusing...


Sure, but there are degrees of "not-America-ness", and not all of the examples listed in TFA occur in all other regions.

UK? So far so good: Maybe those time parsing routines will need some brushing up for the 24 hour based clock, and day and month are swapped, but at least the decimal separator is the same. And it's almost the same language, and definitely the same alphabet.

France? Now you're adding accents to the mix, although in a pinch you can just drop them. (Not that it's a good idea, but there's no better solution.) But watch out: Different decimal separator!

Germany? Better rethink that "just drop the accents" solution; it changes the meaning of words, sometimes drastically/embarrassingly so. However, there is a canonical way of transliterating äöü!

Chinese: Obviously, abandon all hopes of using Latin-1. But at least you can use the US decimal separator again :)


And then tacking on to another poster's "Thai test": there are multiple semi-canonical ways to transliterate Thai to English, and I can only imagine the landscape for Thai to other languages. Added to that, times of day can be referred to in several ways (1 PM could be, literally, "afternoon o'clock") and sometimes different words for the different "sections" of the day, e.g. 7 PM is "1 evening". Depending on your audience and intent. Lots of variation.

As an American, it's really eye-opening to learn other languages and cultures and apply that context to your work in software.


It's pretty wild how different languages can handle things differently. Even basic stuff like numbers. Thank goodness we've standardized on using arabic numerals most of the time. writing out numbers is a mess. E.g. in dutch the last two numbers get swapped when written out:

21 -> "eenentwintig" -> "one and twenty"

French gets especially zany:

80 -> "quatre-vingts" -> "four twenties"

I'm sure there's about a million more ways languages have wildly differing concepts of numbers from what you expect. Now repeat the above exercise for numbers, dates, money, names, just about everything.


And then there’s the Fennoscandia test: do you convert äöåø with the same rules as for German? Because, if you do, it looks very silly at least in Finnish. This is often the case in international sporting events and is basically a joke among Finns how Räikkönen becomes the insanity of ”Raeikkoenen” etc. ”Raikkonen” would be much preferred.


Good point – transliteration obviously needs to be aware of the source language! Even in English, or you might end up attending a Motoerhead concert :)


Indeed. Somehow the rest of us needs a test!


As for dates and times interchange, use ISO 8601 (largest to smallest order).

The EU and many other countries ordering of day, month, year (smallest to largest order) also makes sense, but is sadly ambiguous due to

The US format of month, day, year (middle-endian order) https://9gag.com/gag/a2KEqOe

US not TUrkey is the outlier


DMY dates don't make sense. The units are from smallest to largest, but the string within each unit is still written from biggest to largest.

Furthermore, if DMY makes sense, then you could argue that SS:MM:HH or MM:HH has a valid use case. (It does not.)

Hence, ISO 8601 is the one true date format, assuming you still believe in writing numbers in big endian.

If you want, you can argue for numbers being written in little endian. For example, the year "two thousand twenty-four" could be written as 4202, and maybe pronounced as four twenty thousand-two. In that case, go ahead and write dates as SS:MM:HH DD/MM/YYYY; it is UTC 61:45:12 60/21/4202.


> DMY dates don't make sense.

This is a language issue. DMY makes perfect sense in languages where you'd normally say the date in that sequence. The "problem" is that English doesn't do that. The natural flow in English is MDY, e.g. July 4th, 2024. That not how Germanic language (at least those I know) works. In Danish or German you'd say "4th July, 2024", that's just how those languages work.

I do agree that for computing ISO 8601 is the way to go as it sorts correctly.


> The natural flow in English is MDY, e.g. July 4th, 2024.

I'm sure that's true. This seems more like US custom than the whole English language. "July 4th" sound a bit forced and not natural to me, and I am a native English speaker. Not US English though.


> I'm sure that's true.

I accidentally an important word there. It should be "I'm not sure that's true." It's an English expression that's a bit understated: it means "I'm confident that this is wrong".

You can't always argue from "what feels natural" - that's prone to mere familiarity (1), and someone else who is more used to the other way will have the opposite "feel". It's subjective, and not universal.

1) https://en.wikipedia.org/wiki/Mere-exposure_effect


In England (and Ireland). One tends to say “The 4th of July”. “July 4th” is an Americanism.


Both "the 4th of July" and "July 4th" are common in American English


Yes, and only "the 4th of July" is common in UK English.


Only because that's a holiday. The 23rd of March sounds a bit formal here compared to March 23rd, and sounds like it refers to an event or holiday. Wedding invitations might sometimes use that style, for example.


“Independence Day” is the relevant Americanism.


That’s highly specific to the US. Most other English-speaking countries say “the Fourth of July”. To me it feels weird to say July fourth.


"July fourth" kinda sounds like there are multiple Julies this year and this is the fourth one :D


> The natural flow in English is MDY

In your English maybe. That’s just not a good argument.


Both DMY and HMS are sensical notations of time. For most people the day is the most important part of the cycle in a year. We measure most of our actions and chores in days then months then years. In a meeting, in official letters, taxes etc. everything is specified in days.

Similarly hour is the most common division of a day. Generally humans have an internal tendency to measure things in half of an hour interval and hour is the closest one. One usually rents a parking slot for integer multiples of an hour. The travel distances are specified in hours or fractions of it

Putting the most important measure at first just makes sense.


> then you could argue that MM:HH has a valid use case. (It does not.)

"twenty past ten"

(It does.)


"twenty till ten" is just as valid, so I don't see either of those as a true example of MM:HH.


That's crazy. What's next, abbreviating "Coordinated Universal Time" as UTC instead of CUT?


Of course, the Turkey test itself now fails the Turkey test as Turkey has been updated to Türkiye.


That was a very heavy-handed move that overrides traditions in many languages


Exactly. Every language has their own names for other countries and cities. I once got into an argument with a Russian who insisted I was saying “Moscow” wrong, as apparently I was supposed to say it exactly it like they do, “Moskva” while speaking English. I asked him how Russians say Washington, D.C. and my point was immediately proven.


I can’t imagine getting upset if someone wants to call my country États-Unis or 米国. Only Turks want to tell me how to speak my own language.


> I can’t imagine getting upset if someone wants to call my country États-Unis or 米国.

In an international context? I remember Côte d'Ivoire asking people to please refer to them as Côte d'Ivoire for what seemed the very sensible, practical reason that their citizens were having trouble finding their country in lists when going through customs, registering at hotels etc.


And Burmese / Myanmarians.

Names are always bestowed, never picked.


Georgians don't seem to like either if you call their country Gruzia. And Greeks didn't like it when their neighbors wanted to be Macedonians.

Names often have historical baggage, and using them may send a different message than what you intended.


An interesting one is Germany. Different people call them Allemagne, Tyksland, Saksamaa, Německo, Germany, and many more; of course none of these people ever went to Germany and asked them nicely what they'd like to be called.


I think Tyskland is pretty close, particularly given how neighboring folks actually speak there. Allemange has reasons in history as does Germany. I suppose the others can easily explained likewise. Really, it could be much worse.


> Really, it could be much worse.

These are the official names for Germany. It does get worse!

My broader point is you can call yourself whatever you want, but you cannot tell others how to speak if they don’t agree.


alt.skopje.is.not.macedonia popped into my head but I'm realizing I never read any of those posts.


You're assuming that every Turk supports this decision. I'm not sure if that's in fact the case.

But if it is: People generally do get to change their mind about their preferred address, and people usually oblige, so why not places? It obviously depends a lot on the case, but sometimes the “common” name has a historical association people living there aren’t comfortable with, for example.


米 ... lol, reminds me of a Kurt Vonnegut novel that included his rendering of a human orifice.


You have the wrong orifice. That one means “rice”.



I'm 100% not used to Türkiye spelling, but find it logically useful that it undo confusion between the country and the bird. The birds as well as some verbs tend to sneak into dropdowns when devs cheap out and paste language resource files to Google Translate or ChatGPT.


That only applies to english though. In dutch the country is called Turkije and the bird is called a kalkoen. I'm sure in by far most languages there's no confusion between the bird and the country, and yet everyone across the whole world gets told to spell it Türkiye.


I've always wondered about the politics of that decision, and whether it was broadly supported in the population.

Personally, I'm fine using whatever name a country adopts for itself, but I can't help but notice that this particular change had a bit of a Streisand effect on me. (I really can't say I've experienced a single situation where it wasn't very clear from context whether somebody was referring to the bird or country.)


> can't say I've experienced a single situation where it wasn't very clear from context whether somebody was referring to the bird or country

Tapakapa's detailed video agrees. He also mentions that the diacritic character is needless friction in English. https://www.youtube.com/watch?v=xiidxd5KKw8 (10m58s) [2024-08-10]


Nothing heavy-handed about it at all. Plenty of countries have changed their names (slightly or entirely) over the years.

The more diacritics we are compelled to learn and use, the better I say.


Germany enters the chat.


Most of these problems have incorrect solution. For instance, the actual solution to parsing portrait or landscape is to not use a string for this. It should never have been a string! Other better solutions apply for the rest too.


The command line is defined by strings. It had to be a string to stay within end-user requirements. You may be able to make a good case that the string input is best represented as something parseable by an existing argument parser library, but even then you are dependent on that library handling the strings correctly. Which, at best, pushes the problem onto someone else. Someone else who still has to be aware of input possibilities.


Why would you accept ArBiTrArY-cased command-line arguments in the first place? I don't think I've ever seen a cli utility that accepts the wrong case. Just compare the raw bytes. `ls --ALL` doesn't work, and neither should `my_app --PORTRAIT` (no matter which "I" you type).


Well my friend, you probably didn't use Windows before. FoRmaT /Q /f:NTFS c: is as valid as it gets.


Does this means it is a good idea? It is a whole other subject though.


It's ironic that you are offended by another cultures habits when TFA is a post about localization, it's the same exact subject.


Who says I’m offended?


> Why would you accept ArBiTrArY-cased command-line arguments in the first place?

Human user friendliness?


Oh missed the context, my bad. That being said, in that case, I’m definitely not convinced by the need for toLower (:


You are pausing a command line parameter. It will always come in as a string.


Am I missing something? The realisation comes down to „yes, there are regional differences in formatting“, and doesn’t strike me as a particularly profound insight.


No, you're not missing anything.

Anyone programming outside of the USA is well aware of localisation and how to use it.


I wouldn't say that last bit though. So much code fails outside of the origin country, also from e.g. Dutch coders even though we're well aware how small our country is and how much we rely on trade and collaboration. People across the EU come into the country so you'd need to support other languages, payment methods, currencies, special characters in names, date notations, address formatting, timezones, phone numbers, etc. Part of Belgium even speaks the same language so the barrier really small, or even across the atlantic ocean where there are more people "in" The Netherlands except nobody ever realizes that a part of the country uses the US Dollar as its currency and is in a very different timezone.

Now living in Germany, they even translate timezone names. Not only do you have to look in a timezone database to figure out what a MESZ is, when the timezone database says "there is no such timezone" you have to realize you need to look in a German translation of it. Very approachable for the international people that need to know which timezone that German picked for the meeting invite

Which is all to say: sure, people know other countries exist and are more likely to need to learn to use localization, but by default it's not like everyone knows how to do that


When I was learning programming in 2008 as a student and was still using my computer in Turkish, the number of Java and PHP programs that did not specify a proper locale was outrageously large. For whatever messed up reasons they had lowercase / uppercase conversions for importing modules, opening files. Glibc library in Linux systems is also designed by shortsighted people so libc functions like tolower() made conversions in the local locale.

The end result was phpMyAdmin usually failed with "include not found" because its developers did this: include(tolower(MODULE NAME)) and BLABLAINTERFACE was converted into blablaınterface (there is no dot on top of i there look closely). Or Eclipse failed in similar ways since it was searching plugins using locale-dependent functions.


Maybe this was less widely known back in 2008 when it was published? Still doesn't explain why it's getting attention now.


It's that Turkey in particular makes a good test for a lot of these regional differences.


Some extra tests...

Central Europe has a lot of accentuated Latin characters. ěščřžýáíéů etc.

Cyrillic letters are also worth trying, especially if you are trying to set up an international e-shop and need to print out labels with addresses.

Spanish-speaking people tend to have very long full names, in case that a complete name is needed. Picasso was, in fact, Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso.

Hungarians write surnames first, e.g. Orbán Viktor. Doing it otherwise looks unprofessional and may lead to confusion, because some first names can also be surnames.

Most of Europe writes streetname first and house number second, e.g. Friedrichstrasse 52, so the other way round than Americans and Brits are used to.


>Hungarians write surnames first, e.g. Orbán Viktor.

It's not only Hungary - I usually write my surname first in professional context (but both versions are OK, in general).


Turkiye would like people to call it Turkiye.

even Old New York was once New Amsterdam. Why? Maybe people just liked it better that way, and that's nobody's business but the Turks.

https://www.youtube.com/watch?v=Uqnb_nU7RBE

(Istanbul is the traditional Turkic name for the city, basically a borrowed/altered pronunciation of Constantinople)


"Türkiye", not "Turkiye". I suppose there really is something to the Turkey Test!

That said, I don't know how bad ü -> u is in Turkish. At least in German, doing so often changes the meaning, and äöü have to be transliterated as ae, oe, ue instead. (Obviously a ton of software does something bad like "decompose Unicode characters and filter out non-ASCII letters" and gets it wrong, also hilariously failing the test.)

And in general, sure, every country should get to decide how it wants to be referred to (although I suppose that can raise complicated questions as to who gets to decide that too), but in any case asking a blog from 2008 to retroactively adopt that decision is asking for a bit much, I'd say.


> That said, I don't know how bad ü -> u is in Turkish. At least in German, doing so often changes the meaning, and äöü have to be transliterated as ae, oe, ue instead. (Obviously a ton of software does something bad like "decompose Unicode characters and filter out non-ASCII letters" and gets it wrong, also hilariously failing the test.)

Ü and Ö have almost the same sounds as German in Turkish. When they constructed the Latin-based script for Turkish, they looked at the European languages and their ways of writing. So no surprises there.

Writing ü as u change meaning a lot. "Kul" means a servant; while "kül" is ash. However Turkish lacks the concept of digraphs. Having two vowels next to each other is extremely rare. If two vowels like oe are put together, a native reader will read them as o e (oh eh) and then maybe blend them a little.

So when technology got introduced, we didn't know what to do with ASCII-only systems that Americans sold to us. People started to write ü,ö,ı,ş,ç,ğ as u,o,i,s,c,g. It causes names of the people to be mispronounced in (usually English speaking) media and in international environments. Many young people started to use SMS with those conversions in 90s. So we are stuck.


No idea about ü → u, but dotless to dotted i can be lethal: https://languagelog.ldc.upenn.edu/nll/?p=73


Oh, German definitely has those too :)


tuerkiye


Realistically, the choice is going to be between Turkey and Turkiye. Most people aren't going to go through the hassle of adding the umlaut. Countries can request that people refer to them anything within reason, and in this case reason dictates that it be constrained by ASCII characters like every other country in the world. I think people might make the exception within certain formal contexts, but not in a random Hacker News post.


Not every other country in the world only uses ASCII, Ivory Coast is officially called Republic of Côte d'Ivoire.


You're demonstrating the exact point by showing that they have an ASCII name and an official one. If you're having a regular conversation with someone you're probably going to use Ivory Coast and not the French name.


That's arguably an English name, not an "ASCII name". (If there was one, it would probably be "Cote d'Ivoire".)


...or, as some languages do, even "translate" it by pronounciation: Kotdivuāra


"Türkiye" is not a word in English. When schoolchildren learn English, they learn the alphabet, and there are no diacritics. It's as simple as that. The Ottoman rump state can request spelling changes, and we're happy to oblige, but they can't request alphabet changes and get acquiessence.

https://en.wikipedia.org/wiki/Rump_state


It's obviously up to you how you spell things in English, but why do you draw the line at alphabets? I've definitely seen diacritics used in some proper nouns in English texts for which I have no reason to suspect diplomatic/political pressure.

As I see it, you either oblige with the Turkish government's request to use the Turkish instead of English word for the country (which is what it really is), or you don't – it has nothing to do with alphabets.

Whether people will practically go through the effort of setting up their keyboard layout in a way that lets them type it is a different story; I can't really blame anyone for not doing so in a casual context. Wikipedia usually does so – e.g. they currently use "Turkey", but would likely use "Türkiye" if they were to adopt that.


This is a blog post from 2008


No, Istanbul is from the colloquial version of Constantinople, eis ti Poli (to the City). Long form in Turkish used to be Konstantiniyye until we had the Republic.


I love this song but there's a house version that takes it to another level

Japan = Nihon Germany = Deutschland Mexico = Meh-hee-koh

Let's keep adding to the list see how many we get


> and that's nobody's business but the Turkiys

FTFY


> Or use the RegexOptions.ECMAScript option. In JavaECMAScript, “\d” means [0-9] which gives us: [...]

Wow, the default in the Windows API does not do that? I would have 100% bitten by that at some point doing any development there, coming from a Unix background where I believe the "ECMAScript" behavior is pretty commonplace (and in fact it seems to be a subset of PCREs).


On my phone so can’t double check, but as far as I remember neither Grep nor sed supports \d in their regex implementation.


At least GNU grep supports `-P`, which makes it use PCREs (which do support \d).


This is from 2008


I think the main problem here is that the behavior of the programming language is locale-sensitive by default. What languages other than C# behave like this?


Most languages, I remember having rendering issues in javafx back in the day with commas and dots randomly switching in numbers depending on locale. Then again C# is a Java ripoff so maybe it's because of that, but I've seen cpp Qt apps also behave this way.


OT: I like how he refers to a Hanselman post about great interview questions, and how around 40% of the questions in that post are obsolete. It shows how quickly our field evolves, for better or worse.



One of my first ever public projects actually had a bug report that ended up being the Turkish i problem. Didn’t know there were more!


The normal units test


It's infuriating that most calculator apps, including the quick ones in spotlight, force me to use a comma instead of a period for a decimal delimiter.


Ah yes, ye old m/d/y - d/m/y debate. One is clearly wrong but the Americans won’t listen…


Both are wrong, one is wrong-er. y-m-d h:m:s all the way.


Oracle's 'DD-MON-YYYY' (e.g. 06-DEC-2024) default date format always annoyed me but I have to admit it's unambiguous.


1733443200 is even more unambigous. Or is it? The date format measures how many sun-ups so it is more accurate even if time-dilation. So date format is actually just a sun position measure? Maybe we could just have a sun-ups since 1970 ticker.

Edit: and i guess why have leap seconds and stuff is because we are rrying to combine both sun position and elapsed seconds idk


That number could be anything, such as a user ID. Even among timestamp libraries that work with January 1st 1970 UTC as the zero point, they might parse it as January 21st 1970 if they were expecting milliseconds rather than seconds. It doesn't get much more ambiguous than a plain number!


Attoseconds since the Big Bang is the only datetime system worth a damn.


preach!


While it's confusing and I prefer y-m-d over both of those, the American way is based on the way dates are said in spoken (American) English. Hence things like September 11th, 2001


Kind of feel like maybe any other date would've been a more appropriate example, but I get it. 9/11's just one of those things you can never forget.


It's the most appropriate possible example here because for many non-American readers it will be the one date that they have heard spoken in American English enough times to recall how Americans say it. For people who don't have much connection to American culture, it may well be the ONLY date they have ever heard spoken the American way in their lives.

(Indeed, this one specific date gets said the American way even in British English. We Brits don't do that for any other dates - we say "fourth of July" instead of "July fourth", for instance - but "September 11th" aka "9/11", uniquely among all dates, is written and said the American way in all dialects of English on the planet due to its significance to American culture.)


OP’s backup choice is December 7, 1941.


I'm beginning to harbor resentment


Why do they say "4th of July" then?


They were in a transitional phase of independence.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: