Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> a surprisingly small amount of water ingress would trip a breaker while leaving the racks in good order.

If that were the case they wouldn't be saying "There is no current ETA for recovery," and "it is expected to be an extended outage. Customers are advised to failover to other regions."




There's a lot more to a datacenter building than just the servers sitting on racks. In particular here there was a fire in the power-serving infrastructure (caused by the flood presumably). So nearly all of those servers could be totally fine, just off, but if the power distribution network in the building is literally fried, that's gonna take a long time to fix.


Starting up a cloud region after a total shutdown is likely an untested procedure with no well known timeframe, even if the hardware is ok.


If you're in the business of being a massive cloud provider, hopefully restarting a region is not an untested procedure for you.

You could always test this in a live environment before a region becomes open to customers.


“Test in a live environment before the region becomes open to customers” is a test that’s not entirely representative for “the region had an emergency shutdown with customers on it.” And the latter is something that you can’t reliably test obviously - unless you decide to crash a whole region in live traffic.

I’m sure they have checklist and procedures, but an unknowable laundry list of things will go wrong.


You're right. It's not untested at all. It's just not instantaneous, unfortunately. :)


Having (for example) 6 inches of water in your 115kV switch room is a small-scale problem that can cause a large-scale outage.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: