There is nothing funny about a web service being down? Fuck. Everyone has downtime sometimes. 100% uptime is a fairy tale. Billion dollar businesses (like Microsoft) know this and try to hit lots of nines. This error is probably stressing the fuck out of a bunch of people right now. Someone might even lose their job. MS may have been hit by what in legal parlance is termed "an act of God". Who knows?
What is not acceptable in my opinion is that with this post being an hour old I am still seeing the site down. If anybody at Microsoft knows about this they should at least have a valid splash page in place by now. Redirect the whole domain to a static file somewhere if you have to.
It's only a blog so it's not really a big deal, but this is Microsoft. They should be completely embarrassed to look like this.
They didn't use good, standard practices for handling the exception. It's an example of them not taking their own advice. Advice I'm sure you could find on the website, if it wasn't down.
Wouldn't a decent load-balancer/proxy server (e.g.: haproxy) in front of the application servers be a good idea to redirect traffic to a graceful "Ooops" page/server when something like this happens?
The error message could represent defective hardware. If we did our job right, that's about all it would ever indicate. There's few sane actions one could take in an exception handler to remediate the problem. I will agree that a better overall system design could detect this problem and rotate the defective nodes, though.
But in this case for all we know it could be a software defect from which there is no recovery. IMO it's bad to overdesign recovery mechanisms that just end up masking design errors. Clever staged rollouts of changes are good ways to mitigate the impact of a new regression.
Hardware failure is one thing but in my experience it's just devs not taking the time to properly handle errors and exceptions. If something fails, restart the process and report the error silently in a log, not facing the customer.
The error is coming from IIS. There are several additional layers which could go down inc. CGI, Application Pool(s), Database(s), parts of the LAN (interconnecting different servers/services), or anything else the ASP.NET site relies upon (e.g. handle depletion, disk space, etc).
In IIS this is a "last resort" error when even the custom error handler cannot respond.
On a side note, as anyone who ran .NET/IIS in production knows, when the 500 page has an error (couldn't reach the session store, ran out of disk space, etc.) this would be shown with no way to remove it.
It made for so many unpleasant Saturday evenings...
msoft is huge with tons of products. not surprising that something slipped by. why this is frontpage material is another question (but not surprising since people here seem to like it when msoft falls down).
I'm quite surprised it's still down - either they have no alerting on it at all or something went horribly wrong and there's a team out there somewhere frantically trying to bring it back...
A lot of times MS will outsource these sites to vendors – I used to work for one. For people who don't know what it's like, it's basically a MS project manager making ridiculous demands on a very short time frame, which usually leads to shoddy products being delivered.