Status pages are basically useless if they’re public facing.
Either they automatically update based on automatic tests (like some of the Internet backbone health tests) or they’re manually updated.
If they’re automatic, they’re almost always internal and not public. If they’re manual, they’re almost always delayed and not updated until after the outage is posted to HN anyway.
The other problem with status pages is depending on what happened it may not be possible to update the status page anyway. You really need a third party to have a useful status page.
I prefer fully automated tests publicly revealed because the main thing I want to know (as a customer) is should I keep trying to fix my end or give up because GitHub exploded again.
It’s most annoying when you have something like recently - known maintenance work on my upstream home fiber connection that was resulting in service degradation (but not complete loss, my fiber line was back to DSL or dialup). The chat lady could see that my area was affected, but the issue lookup system couldn’t.
If the issue lookup had told me there as an issue I’d’ve gone on my merry way.
I even checked a few more times until it was resolved; the issue never appeared in the issue lookup system.
This was much much much easier when websites used to explode with tracebacks and other detailed error messages, now you just get a "whoopsie doopsie we did a fuckywucky" and you can't really tell what's going on.
you can't operate at any scale at all without mechanisms in place to know perfectly well whether an issue is impacting a single customer or if your world is on fire
You'd like to think so, but surprisingly large number of "large scale" things operate on the "everything is fine" until too many people complain about the fire.
Quite often you see automated tests that check how well your cache/in memory data are working. But when some other customer that isn't in the hot path tries to access their request times out. I've seen a lot of people making automated checking systems fail at things like this.
Either they automatically update based on automatic tests (like some of the Internet backbone health tests) or they’re manually updated.
If they’re automatic, they’re almost always internal and not public. If they’re manual, they’re almost always delayed and not updated until after the outage is posted to HN anyway.