This article is capitalizing on the Crowdstrike incident. It was costly but a mi...

lesuorac · 2024-08-26T11:23:41 1724671421

Keep in mind, an "additional processes" that would've avoided the Crowdstrike incident was a smoke test on a Windows machine.

Literally if you just checked if the binary + config didn't break a windows machine before pushing to 100%.

There is definitely software erosion. Stores didn't sell CDs of software that just literally crashed your machine when you installed it.

ryoshu · 2024-08-26T12:16:02 1724674562

I worked on AOL 5.0. It did crash machines with a specific softmodem driver. The bug was in the driver, we had to work around it after the gold master release. We didn't have that specific machine/driver in the QA lab, but the execs all had laptops that uncovered the behavior.

Lessons learned.

ornornor · 2024-08-26T15:48:58 1724687338

The way for crowd strike to avoid their incident was adding a very basic (borderline trivial) step in the merge/release pipeline to make sure machines could still boot after running the to be deployed version.

That’s really not much overhead nor is it a novel or groundbreaking process. They chose not to do it or maybe were told about it but decided not to spend any engineering time on it.

This is negligence to me.

lnenad · 2024-08-26T10:27:21 1724668041

I think we can look at the issue from two sides.

There is definite enshittification of software happening all around us with companies unable to understand that an end goal of product development could be achieved and focusing on feature bloat to protect them from up-and-coming startups taking a piece of their cake. This means that both good features and bad ones get added and things have to change constantly making the entire end user experience worse. This complicates things on the softdev side as well as tech debt grows, architecture was made without taking into account some of these features, QA is harder to do well considering the larger surface area. So this leads us to this dystopian view of how things are and when a mistake happens an echo chamber could be easily formed that makes these views (software sucks) feel like postulates.

On the other hand we've never been surrounded by so much software in history and it keeps growing, and will keep growing and so far the earth is not collapsing. There's so much that depends on people typing code into their editors it's truly amazing we've reached this point. Keeping everything afloat in this new reality is increasingly difficult as many of these systems work together and require a broad understanding of many domains (not every product/company has budgets for multiple roles so you have one person that does infra/code/qa with a multitude of tools) to enable them to work without issues. So the number of interactions people have with code is increasing and therefore when problems with software that is used by a lot of customers come up it becomes *very* visible and feels like nothing is working. But in reality several thousands of microprocessors in very close proximity of these people keep chugging along and their phones, payment cards, headphones, monitors, tvs, speakers, smart *x*s, coffee makers, thermostats etc... are as reliable as they ever were with a lot more to offer so the other view could also be very realistic (software has never had this level of quality).