Status pages are basically useless if they’re public facing. Either they automat...

ryathal · on Feb 22, 2024

The other problem with status pages is depending on what happened it may not be possible to update the status page anyway. You really need a third party to have a useful status page.

TheAdamist · on Feb 22, 2024

Which is pretty much what down detector has evolved into. And it looks like they have an enterprise offering to alert companies to their own issues.

op00to · on Feb 22, 2024

Which is better? How do you know whether an issue is individual to a customer or a quick blip that will resolve in a few seconds?

bombcar · on Feb 22, 2024

I prefer fully automated tests publicly revealed because the main thing I want to know (as a customer) is should I keep trying to fix my end or give up because GitHub exploded again.

It’s most annoying when you have something like recently - known maintenance work on my upstream home fiber connection that was resulting in service degradation (but not complete loss, my fiber line was back to DSL or dialup). The chat lady could see that my area was affected, but the issue lookup system couldn’t.

If the issue lookup had told me there as an issue I’d’ve gone on my merry way.

I even checked a few more times until it was resolved; the issue never appeared in the issue lookup system.

op00to · on Feb 22, 2024

> should I keep trying to fix my end or give up because GitHub exploded again

Making this decision easy is a fight I fight for my customers every day. :)

bombcar · on Feb 22, 2024

This was much much much easier when websites used to explode with tracebacks and other detailed error messages, now you just get a "whoopsie doopsie we did a fuckywucky" and you can't really tell what's going on.

menacingly · on Feb 22, 2024

you can't operate at any scale at all without mechanisms in place to know perfectly well whether an issue is impacting a single customer or if your world is on fire

bombcar · on Feb 22, 2024

You'd like to think so, but surprisingly large number of "large scale" things operate on the "everything is fine" until too many people complain about the fire.

pixl97 · on Feb 22, 2024

Caches make problems fun too.

Quite often you see automated tests that check how well your cache/in memory data are working. But when some other customer that isn't in the hot path tries to access their request times out. I've seen a lot of people making automated checking systems fail at things like this.

zitterbewegung · on Feb 22, 2024

The phrase “the hardest parts of computer science is caching and naming things” come to mind.

r2_pilot · on Feb 22, 2024

I see 2 things here but you're off by one.

op00to · on Feb 22, 2024

Yes, but those mechanisms take time to determine this.