Ha. I had an intern who wrote code that blew up if the timestamp of our DB server was ever >= 1 second after the timestamp of our app server.
It couldn't fail at his desk, because he ran both servers locally while developing. And it wouldn't fail in production in the morning, since we had a cron job that synced the clocks at midnight. It took until after 5pm for the clocks to drift enough, meaning I was on call and the intern was not.
But the bug didn't crop up until his code had been in production for a couple of weeks, so it was a real pain in the neck to track down.
Personally I really don't like code that deals with time - it's too easy to place a "get time stamp" call anywhere in code. If it's down in the bowels of some piece of functionality trying to test it becomes a headache.
This is one of those pieces of software engineering wisdom that only comes from experience. Try explaining to a newish dev why they shouldn't `const now = Date.now()` or whatever anywhere they want, but rather do it in one place for the lifecycle and pass it around, and they will look at you funny and think it's a waste of time. Classic case of simple vs easy. Time has the potential to complect everything it touches.
Oh yes. Any time I see code that tries to do some kind of scheduling, the first thing I ask is if they have considered using a library someone else has written. Way too many things can subtly go wrong, but new folks tend to think it should be easy, because how hard can it be to work with dates?
There are seven time zones in Indiana in the tz database[1] - this doesn't count the two most used ones America/Chicago and America/New_York there's also a Kentucky tz that slightly spills into the state. Whenever anyone makes any comment about dates and times being easy to compute I just mention that yes, there are seven time zones in Indiana. Now the actual issues tend to arise when software makes assumptions like "every second will actually occur", "each second only happens once", "every minute has 60 seconds", "every day has 86400 seconds" amongst a plethora of other bad assumptions... but those are a lot more complicated to explain. I like leaning on Indiana.
One realises that one’s time is relative anyway and any universal time is just one’s own will to join a collective agreement that one doesn’t have to abide by. Then one puts this notion aside and prefers not to worry oneself with such things.
Uh. No. In the northern hemisphere, winter is January for centuries and centuries. Almanacs that farmers use also agree with that, so we have food to eat. Reality is much more than just rich first-world comforts. The world is deadly, and we use calendars and time-keeping as a basic survival skill. So basic, you probably don’t even realize it’s there.
There's a difference here between time as a fundamental property of the universe, and "time", which is how we relate a bunch of physical phenomenon to that fundamental force.
That's the primary driver for complexity here. Time as a property is hard to deal with because we take something simple, like how many seconds have passed since some other point in time, and then we have to correlate it to the sun and the moon and how close the Earth is to the sun and what side of the Earth we're on, and etc.
It forces computers to wrap a simple understanding of the passage of time in a lot of context that makes no difference to the computer.
Getting rid of calendaring doesn't imply that no one knows when winter is; winter is still a function of time. It's a pretty simple algorithm to figure out whether a given month is winter or not. It's easy to derive that context from an absolute measurement of time. It's an incredible amount more difficult to go from our calendaring system to any kind of absolute time, because our calendaring is kind of arbitrarily made up to keep things in sync. It bears little semblance to the passage of time, because it's purpose is more about maintaining context (i.e. the sun rises early in the morning, it's winter in January, etc) than actually measuring the passage of time.
I take it you don’t work remotely with a global team or been homeless? I understand your philosophy and largely agree with it. However reality is much more nuanced.
I've been bit by bugs when coming to moment.js from other languages. It's the only datetime library I knew of, before you'll tell me about all the others, that mutates the datetime object when you do arithmetic.
That's a good starting point, and certainly good advice for marking current time and past times. But keep in mind that future event absolute UTC can change due to politics, e.g. a change in daylight savings rules, which is less rare than you might imagine (https://en.m.wikipedia.org/wiki/Daylight_saving_time_by_coun...). That concert scheduled a year from now at 8pm in UTC+5 might actually end up occurring at 8pm in UTC+6.
I once had briefly traveled into a different time zone (so briefly that I hadn't thought about it) when I received a phone call asking to reschedule a doctor appointment. I entered the new time into my calendar. Then, I was an hour late to the appointment. My phone had adopted the timezone of the cell tower, and the calendar assumed I had meant "local time where I scheduled the appointment".
If DST changes and you live in a democracy, it's your fault. Vote against DST. If you don't get what you want unleash the wrath on your congress member, not the software.
Hmm, I think you should use TAI over UTC because UTC has leap-second corrections, Which means that the current time + X seconds might not actually be X seconds from now
I learned this the hard way. Now I wrap interactions with system time in a container class so that, if need be, I can finely control time inside my app in unit tests and debugging.
It's also one of those things that actually is simple in restricted cases, then when you try to generalise it blows up badly, and it blows up like a time bomb rather than a landmine. When a time bomb explodes you don't quite know when it was triggered, but you have to deal with it then and there, learning a heck of a lot about time libs in general before you can proceed. What's a leap second, which years have a leap day, are time zones connected to places somehow, and so on. Just so you can fix your calendar that's part of a larger app.
Personally, I think my definition of "simple" changed over time. Much of that was influenced by the depth of my knowledge of particular paradigms, whether that be software or knowledge of the subjects I was writing code for.
you pass around a time provider module, which you can (and should) mock in tests to ensure that nothing catastrophic happens if time drifts in ways you don't really expect.
A cronjob to sync the time instead of a proper NTP client doesn't sound that good either. I know that each place has its own weird things because historical reasons but NTP is quite ancient.
Our sysadmin was being paranoid about how many memory-resident programs we ran. he figured time syncing wasn't important enough to justify a daemon, so he just ran the ntpdate command on a daily schedule. It wasn't that bad - after all, how far can clocks drift in a single day!? ;-)
For security? Kerberos usually has a 5 minute tolerance. Are you saying that's wrong? Because if your hardware/firmware isn't literally broken you won't drift anywhere near that in a day.
NTP can't properly fix a clock like that either since it's often capped at adjusting the speed by one part per two thousand. At most, with a consistently wrong clock, that can handle about 30 seconds per day. Any worse than that and you won't see much advantage over ntpdate.
The bigger the clock skew, the harder it is to correlate events from logs. The harder it is to corelate, the more information needs to come from your machine-fueled winnowing stage, to the final inspection by eyeballs.
1-2 seconds is (probably) well within manageable. But, since you now know for SUER that you have clocks running at different speeds, you need to over-estimate the skew. And hope that the daily skewing is approximately constant over time.
So, yes, clock skew can have an impact on your security, because it makes event correlation (and followup on security incidents) harder.
"It makes logs more annoying, and sometimes security needs logs" is a pretty weak connection, though. And it's a far cry from precise timing being "THE most important thing on a server". Is there anything more direct at all?
It's basically a whole bunch of small "make it harder to correlate" issues. If you have a distributed system (which you probably do, if you care about time to the point of considering if daily ntpdate or continually running ntp-or-equivalent is better), you will probably end up with timestamps somewhere in a protocol.
This could be timestamps in a DB server, or similar.
You can then, if you have too-large skew, end up in the weird position that one of "things that have been commited in the DB is not yet showing up on the frontends" (if the time used as a cut-off for the frontend's query is lagging behind the DB server and the timestamp is set by the DB server) or "things that have been commited are not showing up when you query for SELECT timestamp <= NOW()" (if the DB server is lagging, and the timestamp is set by the client).
If that maters, well, that's really a business and data quality issue.
Some distributed systems will also try to figure out what skew you have across the whole system and then end up taking N times that skew, before it can consider data persisted (see for example Spanner ,and probably CockroachDB). If your distributed system relies on timestamps for consistency, and it doesn't self-discover the skew, you're basically not guaranteed whatever consistency guarantees that the distributed system claims to have.
Again, is this important? It really depends. Is it OK of your distributed data store drops some of your data on the floor and lets you clean up the mess? Sometimes, yes, totally. Is it OK if you get uniqueness guarantees violatedl because two thigs got the same unique ID? Again, sometimes, almost-unique is enough.
Fo most people, log correlation is probably the biggest point, though.
after all, how far can clocks drift in a single day!?
Well, having run a Linux kernel on OpenBSD's vmm: they can drift more than a single day in that time. I did have to resort to using an ntpdate cron job because ntpd just couldn't cope with the time dilation effect. The cron job was configured at * * * * * (i.e. every minute), which roughly translated to once every 180 wall-clock seconds (+/- 30s).
Delivering real functionality in production (with code review from your mentor, of course) is absolutely standard at a Big Tech or hot startup internship.
This was the early 2000s, at a very small startup funded privately by a non-technical guy, and most of the technical staff was under 25. It's a big world, and best practices are not universal.
I once had to implement date picker that returned the date to be sent to backend to fetch the day's reservations. It seemed to work fine the day I implemented it. Next day it broke. Turned out the widget returned MM/DD/YYYY instead of DD/MM/YYYY.
And I implemented it on 10th of October.
Aliens landed on this planet, found out about the date format MM/DD/YYYY, and left in a hurry. They are currently preparing a gamma ray burst out of pure compassion and pity but also a hint of disgust!
Is that date format used anywhere outside the US? We have always used YYYY-MM-DD. Which is very nice, if for nothing you can use a single string sort if for whatever reason need to sort them.
Yeah, me too. I had severe time constraints (heh) and just wanted to be done with it with minimal conversion. Which I then had to do anyway later. I also learned on that day that Javascript's getUTCMonth() is zero-based (January is month 0). And I had to sync time between timezones as well. I don't want to work with dates ever again.
I've learned the hard way that this is also dependent on the user's date format set on their computer in some libraries(iirc it was jquery ui date picker a few years back). its liberating when you decide to throw it all out and just use integers for all things not facing an end user.
We had situation where if you worked late, the test suite would consistently fail.
Since developers rarely worked late and would give up when hitting the "random" test suite failure and go home, the bug persisted for months before the true cause was understood.
Our time zone was UTC-5, and the test suite contained a local-vs-UTC bug which only triggered between 7pm and midnight.
It is. I'm in London which means that our time is +/- 1 hour of UTC depending on daylight saving time.
There's been a couple projects where if I pushed a commit between 11pm and 1am some tests would break. I'm pretty sure the issue is inconsistent use of UTC vs local time which during that period will differ by one day.
One of the perils of working in the UK is _assuming_ that you've been using UTC timestamps this whole time, and then summer rolls around and suddenly a bunch of tests start failing, and all your production logs are an hour out because you were wrong, and actually used local timestamps everywhere. Oops.
I worked with date range in a service that expected start date to be inclusive, but end date to be exclusive. So when you wanted to get data from dates 12-15, you'd have to send query for 12-16.
That sounds like a naive date->datetime conversion rather than explicit boundary exclusion. If you convert a date to a datetime, you get midnight at the start of day, which is fine if you're only comparing dates, but fails spectacularly in the way you describe for date ranges.
MSSQL not supporting a DATE data type has been the bane of my existence for a long time (of course, Postgresql has supported it since forever, but try telling your application vendor that). And since it took so long to include it, hardly any application uses it even today, because of inertia and compatibility with existing databases.
I had a coworker working on a data-pipeline sort of thing once. They for some reason wrote their own date parser (Akamai logs) and used a regular expression to find the month. For whatever reason, the RegExp didn't work for March (I think) and so at midnight 1 March the year of first deploy, we started dropping all the logs as un-parseable (causing the NOC to think the traffic had gone to 0 or our system was buggy). We all got paged and eventually figured out it. It had been running in live for several months without problems at that time.
Fantastic. We had a similar problem with test code that always worked on a remote team's computers during their work day but failed when we got into the office and tried to run it in a different timezone.
Reminds me of an app I worked on a while ago where both the server it was running on and the datetimes in the database had to be in a specific time zone. Change either (including changing both to an identical but different TZ) and the app just failed to run.
I've seen something similar where some automated unit tests always passed and when I tried to run it locally would always fail. After digging around I realized it was making assumptions about the timezone it ran in, which was on the east coast, whereas I was on the west.
I've had to fix any number of test suite bugs as well as regressions that happened because the people who wrote the tests had ensured the test suite didn't (most of the time) test what they thought it would by relying on date arithmetic from current date and time...
Many people think that test suite authoring is easy mode. It really isn't. Making a good test suite that actually does a good job of tracking invariants is a royal pain.
As far as I'm concerned, writing tests is a different skillset and its often underdeveloped. It then snowballs into hating testing, because all you've experienced is a poor test suite that gets worse over time. It's hard to do well as you say, and it also gets glossed over in code review as long as the tests pass. Do the tests actually verify anything useful? Who knows, build is green!
That's how you get an application with 50,000 unit and integration tests that you can't run locally, and requires massive parallelism in CI to finish in any reasonable timeframe.
My favourite is the test suite where each test relies on data from every other test, so running a test in isolation fails, and adding a new test is almost impossible without breaking a bunch of other tests.
Kind of sort of. The catch here is that because runtime is compile time with Perl, it's not exactly the same. Looking at the code, it's kind of like having something like
#if (some C preprocessor expression that's true on Fridays
printf("Hello world!);
#else
printf;
#fi
(although it's been long enough since I've written C that I don't know if (a) there exists a C-preprocessor instruction as per the above, (2) if the compiler will notice errors in the unexecuted branch of an #if and (iii) if printf; would give a syntax error).
Another equivalent piece of code could be something along the lines of the JS
if (today_is_friday()) {
eval("2+2")
}
else {
eval("--2abc")
}
One perhaps apocryphal story that's kicking around out there was that Draper Labs had a hard-real-time navigation system which... took too much time computing something when the moon was full.
Saw some tests that had the same pattern. Passed half the day, and failed the other half of the day. I assume it always worked during India work hours.
Fortunately we rarely use 12h clocks where I live. But that opens up other avenues: if you are not much of an early riser, a build script failing to deal with single digit hours can remain undetected for a very long time.
It could have been much worse. At least you didn't have a leap year or timezone bug. I've exterminated more than one of those in my time as an engineer, and it's never really any fun.
Unfortunately, the intern only worked the mornings, so it took several days of back and forth before the bug could be finally put to rest.