Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Off the top of my head, this is the third time they've had a major outage where they've been unable to properly update the status page. First we had the S3 outage, where the yellow and red icons were hosted in S3 and unable to be accessed. Second we had the Kinesis outage, which snowballed into a Cognito outage, so they were unable to login into the status page CMS. Now this.

They "own up to it" in their postmortems, but after multiple failures they're still unwilling to implement the obvious solution and what is widely regarded as best practice: host the status page on a different platform.




Firmly agreed. I've heard AWS discuss making the status page better – but they get really quiet about actually doing it. In my experience the best/only way to check for problems is to search Twitter for your AWS region name.


Maybe AWS should host their status checks in Azure and vice versa ... Mutually Assured Monitoring :) Otherwise it becomes a problem of who will monitor the monitor


My company is quite well known for blameless post-mortems, but if someone failed to implement improvements after three subsequent outages, they would be moved to a position more appropriate for their skills.


This challenge is not specific to Amazon.

Being able to automatically detect system health is a non-trivial effort.


That’s not what’s being asked though - in all 3 events, they couldn’t manually update it. It’s clearly not a priority to fix it for even manual alerts.


They had all day to do it manually.


>Be capable of spinning up virtualized instances (including custom drive configurations, network stacks, complex routing schemes, even GPUs) with a simple API call

But,

>Be incapable of querying the status of such things

Yeah, I don't believe it.


As others mention, you can do it manually. But it’s also not that hard to do automatically: literally just spin up a “client” of your service and make sure it works.


Why automatic? Surely someone could have the responsibility to do it manually.


Or override the autogenerated values


Eh, the colored icons not loading is not really the same thing as incorrectly reporting that nothing’s wrong. Putting the status page on separate infra would be good practice, though.


The icons showed green.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: