I work on a large distributed infrastructure. I always joked that my team's proj...

I work on a large distributed infrastructure. I always joked that my team's projects and people's careers are outage-driven: the only time we become important and people get opportunities for promotion is when where were big-enough outages that executives have to invest heavily on reliability or scalability. Other time, we are just minions who must listen to and serve feature or product teams. Nobody listen to us when we ask a product team to implement a reliability contract in their shining product.

Pre-outage improvements, reliability defense in depth, eliminated scalability bottlenecks before they are hit, are all ignored by leadership and the company: it is just human nature that even though they understand you have to prepare for possible issues, if it hasn't happen yet, you won't take it seriously. I've seen this in many internal performance reviews and promotion committees. People who haven't ever got bitten badly by an outage may call these premature optimizations.