Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seem to me this list needs to incorporate how easily these bugs could have been avoided/detected/fixed, rather than just how dire the consequences were. It doesn't say much about what people did to test their code. For instance the first one in the list is something unit testing would have fixed. Take the trajectory function, plug numbers in, see if it's correct.

Some of these things were a lot more obvious than others.

Race conditions, for example, can be really hard to find, but as long as you know it might happen (these days it's just about every system) you can take precautions for testing. If it's important, maybe hire someone with experience.

The AT&T network crash thing looks pretty unobvious to me. A network graph can have a huge number of topologies, so you can't really test them all. Machines might also be using different versions of software that don't interact nicely. Sounds like they took sensible precautions and were thus able to roll back. That's why "rollback" is a word.

There's a whole class of bugs where things work and then need to be upgraded. You think it will work, because there aren't many changes and stuff is qualitatively the same. Like the number overflow bug in the Ariadne, or the buffer overflow in the finger daemon.



Unit tests would be highly unlikely to catch most of those.

"a formula written on paper in pencil was improperly transcribed", "neglect to properly "seed" the program's random number generator, A HW bug that's not close to obvious numbers to check, intentionally inserted bugs, input outside of the intended design, etc.


>"a formula written on paper in pencil was improperly transcribed"

offtopic, but a unit type would have prevented that. i had no idea how many errors i was making in my math programs before i started using F#'s type checker to make sure all the types lined up properly.


I don't know the actual transcription error, but how's it going to find a 5 being made a 6 or something?


it wont, but the vast majority of errors with math formulas are along the lines of adding velocities with positions, raising something to the wrong power, using a multiply instead of an add, putting a parenthesis in the wrong spot, performing equations in the wrong order, etc.

all of those can get caught with type checking, but it isn't perfect


Correct. A unit test is a defect removal mechanism. What these faults needed was a defect prevention mechanism. One of those mechanisms is Design/Code reviews.

With all the emphasis on testing and TDD, etc, I get the feeling that reviews are getting the shaft. They are both important, for different reasons.


The AT&T network crash bug was caused by a well-formed message coming from a crashed system, so that would have been caught by unit testing too.


The non hardware based floating point bugs would not be an issue if using a variable precision floating point format, such as the in development Unum (previous HN discussion: https://news.ycombinator.com/item?id=9943589)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: