Hacker News new | past | comments | ask | show | jobs | submit login

The hardware is likely being blamed for a lot of failures that are software related. Its almost assured there is a software problem in cases where reinstalling the machine fixes the problem. A random machine which won't boot due to disk/filesystem failures could be a hardware issue, but that is pretty much ruled out if reinstalling/reformatting doesn't immediately manifest in further failure. Bit rot, stuck bits, bad links are a thing, but they generally show up as massive soft error correction long before it reaches the point of simply being unable to read the sector and when that happens the OS will almost always tell you that the sector can't be read rather than giving you garbage data.

That is because the likelyhood of undetected hardware failures given the layers and layers of ECC on the disks, links/etc manifesting itself as filesystem meta data failures rather than garbage in the middle of video/images/document streams/etc is really unlikely. Or the more likely case of the machine performance degrading due to read retry/ecc correction/retransmission making the machine appear to have severe performance issues long before it manifests as silent data corruption sufficient to eat the filesystem structure (its a fun excise to intentionally flip a few random bits on a hard-drive image (or in RAM)) and see if/when they are detected.

So, yes the first thing I think when I hear filesystem corruption is BUG! That is what the experience of tracking down a number of incidents in a large data storage application a few years ago taught me.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: