If you are IT team for a large impactful organization, you have to control updates to your organization's fleet. You cannot let vendors push updates directly. You have to stage those updates and test them and then do a gradual rollout to your whole organization.
Plus, for your critical communication systems, you must have a disaster recovery plan that actually helps you recover quickly in minutes, not hours or days. And you have to exercise this plan regularly.
If you are crowd strike, shame on you for not testing your product better. You failed to meet a very low bar. You just shipped a 100% reproducible widely impactful bug. Your customers must leave you for a more diligent vendor.
And I really hope the leadership teams in every software engineering organization learn a valuable lesson from this – listen to that lone senior engineer in your leadership team who pushes for better craft and operational rigor in your engineering culture; take it seriously - it has real business impact.
Plus, for your critical communication systems, you must have a disaster recovery plan that actually helps you recover quickly in minutes, not hours or days. And you have to exercise this plan regularly.
If you are crowd strike, shame on you for not testing your product better. You failed to meet a very low bar. You just shipped a 100% reproducible widely impactful bug. Your customers must leave you for a more diligent vendor.
And I really hope the leadership teams in every software engineering organization learn a valuable lesson from this – listen to that lone senior engineer in your leadership team who pushes for better craft and operational rigor in your engineering culture; take it seriously - it has real business impact.