Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This outage seems to be the natural result of removing QA by a different team than the (always optimistic) dev team as a mandatory step for extremely important changes. And neglecting canary type validations. The big question is will businesses migrate away from such a visibly incompetent organization. (Note I blame the overall org; I am sure talented individuals tried their best inside a set of procedures that asked for trouble.)


So there was apparently an Azure outage prior to this big one. One thing that is a pretty common pattern in my company when there are big outages is something like this:

1. Problem A happens, it’s pretty bad

2. A fix is rushed out very quickly for problem A. It is not given the usual amount of scrutiny, because Problem A needs to be fixed urgently.

3. The fix for Problem A ends up causing Problem B, which is a much bigger problem.

tl;dr don’t rush your hotfixes through and cut corners in the process, this often leads to more pain




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: