There's one more test: the Thailand test. I've been bitten by issues in Java where Calendar.getInstance() will return a Thai Buddhist calendar, which is exactly the same as the Gregorian calendar except 543 years in the future.
Exists in iOS too - once built an iPad black car dispatch app where one driver was constantly having issues. After weeks of remote debugging the car company owner just asked the driver to come in, as soon as he told me it was a Buddhist calendar I had it fixed. Now I always triple check my assumptions.
I never really understood why not simply use milliseconds since the epoch and keep track of a time zone separately if needed... This allows fairly easy conversion to most native systems i have worked with, easy sorting and diff operations as well as final display control. Also don't have a need to be thinking about daylight savings then, or do other conversions for say timesheets.
It depends on what you need to do. If you just need to say when something happened (e.g. some log event) then milliseconds since the epoch is fine. If want a user to schedule a reoccurring meeting at 8am on the first Thursday of every month, you can also keep a set of timestamps as milliseconds since the epoch, but you are going to need to do some work to figure out the right numbers correctly.
> If want a user to schedule a reoccurring meeting at 8am on the first Thursday of every month, you can also keep a set of timestamps as milliseconds since the epoch
No, you can't really, especially if the recurring meetings go on more than a few years into the future. If the time zone rules themselves change (e.g. if the user's country abolishes or introduces DST), then the timestamps you stored will become wrong in light of the change.
There’s another fun thing we can call the Britain test.
The short form for September is Sept in en-GB, the only month abbreviated with four letters. It’s Sep in en-US and other en- locales. All other parsing is identical.
For example, if you’re parsing abbreviated date formats on AWS, this parsing fails only on eu-west-1 and -2 servers, and only in September.
Something that drives me crazy about the big public clouds is that everything is US regional defaults with - if you’re lucky - UTC as the time zone.
We’ve spent easily several months of engineering time “fixing” apps that kept picking up these defaults directly or indirectly. E.g.: What regional options does Azure Data Factory use for parsing CSV by default for batch jobs if you deploy an instance in Australia East? Your guess is as good as mine!
It’s especially hilarious considering that all three major operating systems ask these questions up front, just as they as for a password instead of providing a result. Vendors have had to learn the hard way that defaults for some settings are nonsense for 96% of the planet, and now the same companies have forgotten this lesson and will have to learn it all over again.
UTC timestamps, but displayed on a 12-hour clock, are among the most cursed things I have to work with every day. Always trips me up, and much more so than "fully US" conventions (i.e. local time and MM/DD/YYYY and am/pm timestamps).
Sometimes (but not always), it helps to change the locale to en_UK. But then spellcheck becomes confusing...
Sure, but there are degrees of "not-America-ness", and not all of the examples listed in TFA occur in all other regions.
UK? So far so good: Maybe those time parsing routines will need some brushing up for the 24 hour based clock, and day and month are swapped, but at least the decimal separator is the same. And it's almost the same language, and definitely the same alphabet.
France? Now you're adding accents to the mix, although in a pinch you can just drop them. (Not that it's a good idea, but there's no better solution.) But watch out: Different decimal separator!
Germany? Better rethink that "just drop the accents" solution; it changes the meaning of words, sometimes drastically/embarrassingly so. However, there is a canonical way of transliterating äöü!
Chinese: Obviously, abandon all hopes of using Latin-1. But at least you can use the US decimal separator again :)
And then tacking on to another poster's "Thai test": there are multiple semi-canonical ways to transliterate Thai to English, and I can only imagine the landscape for Thai to other languages. Added to that, times of day can be referred to in several ways (1 PM could be, literally, "afternoon o'clock") and sometimes different words for the different "sections" of the day, e.g. 7 PM is "1 evening". Depending on your audience and intent. Lots of variation.
As an American, it's really eye-opening to learn other languages and cultures and apply that context to your work in software.
It's pretty wild how different languages can handle things differently. Even basic stuff like numbers. Thank goodness we've standardized on using arabic numerals most of the time. writing out numbers is a mess. E.g. in dutch the last two numbers get swapped when written out:
21 -> "eenentwintig" -> "one and twenty"
French gets especially zany:
80 -> "quatre-vingts" -> "four twenties"
I'm sure there's about a million more ways languages have wildly differing concepts of numbers from what you expect. Now repeat the above exercise for numbers, dates, money, names, just about everything.
And then there’s the Fennoscandia test: do you convert äöåø with the same rules as for German? Because, if you do, it looks very silly at least in Finnish. This is often the case in international sporting events and is basically a joke among Finns how Räikkönen becomes the insanity of ”Raeikkoenen” etc. ”Raikkonen” would be much preferred.
Good point – transliteration obviously needs to be aware of the source language! Even in English, or you might end up attending a Motoerhead concert :)
DMY dates don't make sense. The units are from smallest to largest, but the string within each unit is still written from biggest to largest.
Furthermore, if DMY makes sense, then you could argue that SS:MM:HH or MM:HH has a valid use case. (It does not.)
Hence, ISO 8601 is the one true date format, assuming you still believe in writing numbers in big endian.
If you want, you can argue for numbers being written in little endian. For example, the year "two thousand twenty-four" could be written as 4202, and maybe pronounced as four twenty thousand-two. In that case, go ahead and write dates as SS:MM:HH DD/MM/YYYY; it is UTC 61:45:12 60/21/4202.
This is a language issue. DMY makes perfect sense in languages where you'd normally say the date in that sequence. The "problem" is that English doesn't do that. The natural flow in English is MDY, e.g. July 4th, 2024. That not how Germanic language (at least those I know) works. In Danish or German you'd say "4th July, 2024", that's just how those languages work.
I do agree that for computing ISO 8601 is the way to go as it sorts correctly.
> The natural flow in English is MDY, e.g. July 4th, 2024.
I'm sure that's true. This seems more like US custom than the whole English language. "July 4th" sound a bit forced and not natural to me, and I am a native English speaker. Not US English though.
I accidentally an important word there. It should be "I'm not sure that's true." It's an English expression that's a bit understated: it means "I'm confident that this is wrong".
You can't always argue from "what feels natural" - that's prone to mere familiarity (1), and someone else who is more used to the other way will have the opposite "feel". It's subjective, and not universal.
Only because that's a holiday. The 23rd of March sounds a bit formal here compared to March 23rd, and sounds like it refers to an event or holiday. Wedding invitations might sometimes use that style, for example.
Both DMY and HMS are sensical notations of time. For most people the day is the most important part of the cycle in a year. We measure most of our actions and chores in days then months then years. In a meeting, in official letters, taxes etc. everything is specified in days.
Similarly hour is the most common division of a day. Generally humans have an internal tendency to measure things in half of an hour interval and hour is the closest one. One usually rents a parking slot for integer multiples of an hour. The travel distances are specified in hours or fractions of it
Putting the most important measure at first just makes sense.
Exactly. Every language has their own names for other countries and cities. I once got into an argument with a Russian who insisted I was saying “Moscow” wrong, as apparently I was supposed to say it exactly it like they do, “Moskva” while speaking English. I asked him how Russians say Washington, D.C. and my point was immediately proven.
> I can’t imagine getting upset if someone wants to call my country États-Unis or 米国.
In an international context? I remember Côte d'Ivoire asking people to please refer to them as Côte d'Ivoire for what seemed the very sensible, practical reason that their citizens were having trouble finding their country in lists when going through customs, registering at hotels etc.
An interesting one is Germany. Different people call them Allemagne, Tyksland, Saksamaa, Německo, Germany, and many more; of course none of these people ever went to Germany and asked them nicely what they'd like to be called.
I think Tyskland is pretty close, particularly given how neighboring folks actually speak there. Allemange has reasons in history as does Germany. I suppose the others can easily explained likewise. Really, it could be much worse.
You're assuming that every Turk supports this decision. I'm not sure if that's in fact the case.
But if it is: People generally do get to change their mind about their preferred address, and people usually oblige, so why not places? It obviously depends a lot on the case, but sometimes the “common” name has a historical association people living there aren’t comfortable with, for example.
I'm 100% not used to Türkiye spelling, but find it logically useful that it undo confusion between the country and the bird. The birds as well as some verbs tend to sneak into dropdowns when devs cheap out and paste language resource files to Google Translate or ChatGPT.
That only applies to english though. In dutch the country is called Turkije and the bird is called a kalkoen. I'm sure in by far most languages there's no confusion between the bird and the country, and yet everyone across the whole world gets told to spell it Türkiye.
I've always wondered about the politics of that decision, and whether it was broadly supported in the population.
Personally, I'm fine using whatever name a country adopts for itself, but I can't help but notice that this particular change had a bit of a Streisand effect on me. (I really can't say I've experienced a single situation where it wasn't very clear from context whether somebody was referring to the bird or country.)
> can't say I've experienced a single situation where it wasn't very clear from context whether somebody was referring to the bird or country
Tapakapa's detailed video agrees. He also mentions that the diacritic character is needless friction in English. https://www.youtube.com/watch?v=xiidxd5KKw8 (10m58s) [2024-08-10]
Most of these problems have incorrect solution. For instance, the actual solution to parsing portrait or landscape is to not use a string for this. It should never have been a string! Other better solutions apply for the rest too.
The command line is defined by strings. It had to be a string to stay within end-user requirements. You may be able to make a good case that the string input is best represented as something parseable by an existing argument parser library, but even then you are dependent on that library handling the strings correctly. Which, at best, pushes the problem onto someone else. Someone else who still has to be aware of input possibilities.
Why would you accept ArBiTrArY-cased command-line arguments in the first place? I don't think I've ever seen a cli utility that accepts the wrong case. Just compare the raw bytes. `ls --ALL` doesn't work, and neither should `my_app --PORTRAIT` (no matter which "I" you type).
Am I missing something? The realisation comes down to „yes, there are regional differences in formatting“, and doesn’t strike me as a particularly profound insight.
I wouldn't say that last bit though. So much code fails outside of the origin country, also from e.g. Dutch coders even though we're well aware how small our country is and how much we rely on trade and collaboration. People across the EU come into the country so you'd need to support other languages, payment methods, currencies, special characters in names, date notations, address formatting, timezones, phone numbers, etc. Part of Belgium even speaks the same language so the barrier really small, or even across the atlantic ocean where there are more people "in" The Netherlands except nobody ever realizes that a part of the country uses the US Dollar as its currency and is in a very different timezone.
Now living in Germany, they even translate timezone names. Not only do you have to look in a timezone database to figure out what a MESZ is, when the timezone database says "there is no such timezone" you have to realize you need to look in a German translation of it. Very approachable for the international people that need to know which timezone that German picked for the meeting invite
Which is all to say: sure, people know other countries exist and are more likely to need to learn to use localization, but by default it's not like everyone knows how to do that
When I was learning programming in 2008 as a student and was still using my computer in Turkish, the number of Java and PHP programs that did not specify a proper locale was outrageously large. For whatever messed up reasons they had lowercase / uppercase conversions for importing modules, opening files. Glibc library in Linux systems is also designed by shortsighted people so libc functions like tolower() made conversions in the local locale.
The end result was phpMyAdmin usually failed with "include not found" because its developers did this: include(tolower(MODULE NAME)) and BLABLAINTERFACE was converted into blablaınterface (there is no dot on top of i there look closely). Or Eclipse failed in similar ways since it was searching plugins using locale-dependent functions.
Central Europe has a lot of accentuated Latin characters. ěščřžýáíéů etc.
Cyrillic letters are also worth trying, especially if you are trying to set up an international e-shop and need to print out labels with addresses.
Spanish-speaking people tend to have very long full names, in case that a complete name is needed. Picasso was, in fact, Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso.
Hungarians write surnames first, e.g. Orbán Viktor. Doing it otherwise looks unprofessional and may lead to confusion, because some first names can also be surnames.
Most of Europe writes streetname first and house number second, e.g. Friedrichstrasse 52, so the other way round than Americans and Brits are used to.
"Türkiye", not "Turkiye". I suppose there really is something to the Turkey Test!
That said, I don't know how bad ü -> u is in Turkish. At least in German, doing so often changes the meaning, and äöü have to be transliterated as ae, oe, ue instead. (Obviously a ton of software does something bad like "decompose Unicode characters and filter out non-ASCII letters" and gets it wrong, also hilariously failing the test.)
And in general, sure, every country should get to decide how it wants to be referred to (although I suppose that can raise complicated questions as to who gets to decide that too), but in any case asking a blog from 2008 to retroactively adopt that decision is asking for a bit much, I'd say.
> That said, I don't know how bad ü -> u is in Turkish. At least in German, doing so often changes the meaning, and äöü have to be transliterated as ae, oe, ue instead. (Obviously a ton of software does something bad like "decompose Unicode characters and filter out non-ASCII letters" and gets it wrong, also hilariously failing the test.)
Ü and Ö have almost the same sounds as German in Turkish. When they constructed the Latin-based script for Turkish, they looked at the European languages and their ways of writing. So no surprises there.
Writing ü as u change meaning a lot. "Kul" means a servant; while "kül" is ash. However Turkish lacks the concept of digraphs. Having two vowels next to each other is extremely rare. If two vowels like oe are put together, a native reader will read them as o e (oh eh) and then maybe blend them a little.
So when technology got introduced, we didn't know what to do with ASCII-only systems that Americans sold to us. People started to write ü,ö,ı,ş,ç,ğ as u,o,i,s,c,g. It causes names of the people to be mispronounced in (usually English speaking) media and in international environments. Many young people started to use SMS with those conversions in 90s. So we are stuck.
Realistically, the choice is going to be between Turkey and Turkiye. Most people aren't going to go through the hassle of adding the umlaut. Countries can request that people refer to them anything within reason, and in this case reason dictates that it be constrained by ASCII characters like every other country in the world. I think people might make the exception within certain formal contexts, but not in a random Hacker News post.
You're demonstrating the exact point by showing that they have an ASCII name and an official one. If you're having a regular conversation with someone you're probably going to use Ivory Coast and not the French name.
"Türkiye" is not a word in English. When schoolchildren learn English, they learn the alphabet, and there are no diacritics. It's as simple as that. The Ottoman rump state can request spelling changes, and we're happy to oblige, but they can't request alphabet changes and get acquiessence.
It's obviously up to you how you spell things in English, but why do you draw the line at alphabets? I've definitely seen diacritics used in some proper nouns in English texts for which I have no reason to suspect diplomatic/political pressure.
As I see it, you either oblige with the Turkish government's request to use the Turkish instead of English word for the country (which is what it really is), or you don't – it has nothing to do with alphabets.
Whether people will practically go through the effort of setting up their keyboard layout in a way that lets them type it is a different story; I can't really blame anyone for not doing so in a casual context. Wikipedia usually does so – e.g. they currently use "Turkey", but would likely use "Türkiye" if they were to adopt that.
No, Istanbul is from the colloquial version of Constantinople, eis ti Poli (to the City). Long form in Turkish used to be Konstantiniyye until we had the Republic.
> Or use the RegexOptions.ECMAScript option. In JavaECMAScript, “\d” means [0-9] which gives us: [...]
Wow, the default in the Windows API does not do that? I would have 100% bitten by that at some point doing any development there, coming from a Unix background where I believe the "ECMAScript" behavior is pretty commonplace (and in fact it seems to be a subset of PCREs).
I think the main problem here is that the behavior of the programming language is locale-sensitive by default. What languages other than C# behave like this?
Most languages, I remember having rendering issues in javafx back in the day with commas and dots randomly switching in numbers depending on locale. Then again C# is a Java ripoff so maybe it's because of that, but I've seen cpp Qt apps also behave this way.
OT: I like how he refers to a Hanselman post about great interview questions, and how around 40% of the questions in that post are obsolete. It shows how quickly our field evolves, for better or worse.
It's infuriating that most calculator apps, including the quick ones in spotlight, force me to use a comma instead of a period for a decimal delimiter.
1733443200 is even more unambigous. Or is it? The date format measures how many sun-ups so it is more accurate even if time-dilation. So date format is actually just a sun position measure? Maybe we could just have a sun-ups since 1970 ticker.
Edit: and i guess why have leap seconds and stuff is because we are rrying to combine both sun position and elapsed seconds idk
That number could be anything, such as a user ID. Even among timestamp libraries that work with January 1st 1970 UTC as the zero point, they might parse it as January 21st 1970 if they were expecting milliseconds rather than seconds. It doesn't get much more ambiguous than a plain number!
While it's confusing and I prefer y-m-d over both of those, the American way is based on the way dates are said in spoken (American) English. Hence things like September 11th, 2001
It's the most appropriate possible example here because for many non-American readers it will be the one date that they have heard spoken in American English enough times to recall how Americans say it. For people who don't have much connection to American culture, it may well be the ONLY date they have ever heard spoken the American way in their lives.
(Indeed, this one specific date gets said the American way even in British English. We Brits don't do that for any other dates - we say "fourth of July" instead of "July fourth", for instance - but "September 11th" aka "9/11", uniquely among all dates, is written and said the American way in all dialects of English on the planet due to its significance to American culture.)