Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ECC should be a requirement.

The FCC could just not allow computers to ship it without.

CPU makers like Intel and AMD could simply have their CPUs not work with non-ECC RAM.

Microsoft could e.g. require ECC RAM for Windows 12.

It is insanity that most computers shipping today do not use ECC and are thus unreliable.

With luck they'll crash, but most likely they will fail silently, while corrupting data.



so true. anecdotal, I've gone from 4 blue screens a month down to zero after going ECC, on the few desktops we require windows.

everyone must still be quoting numbers when we had 4mb of premium chips. now that all pcs have 8-128gb of the crumiest, cheapest silicon... i bet the failure rates are way more noticeable.

sadly, i got laptops for my company that have a PRO amd cpu and sodimm sockets... only to find out ecc sodimm ram is sold by one manufacturer gouging the NAS market with crazy insane prices.


I've been there, and it was a pain. All my backups were corrupted due to a faulty RAM module. Initially, I blamed the hard drives because they seemed to be failing right before my eyes. I was copying a large file; sometimes it copied okay, but occasionally it would become corrupted. Since then, I've been paying a premium for ECC.


Same experience. We were doing all the things, regular backups, rotating them, verifying them. During a weekly verification test, it failed. Tested some older backups and they failed too! If the data matters, it’s hard to express the stress and disconcert you feel in this moment.

Memory is different from all other resources in the system. We are conditioned as engineers, we know drives fail more frequently than other resources. When memory fails it is indistinguishable from a drive failure. There are some system behaviors that matter too, we tend to think that page allocation is random and on heavily loaded systems it appears to be, but on specialized systems it can be rather consistent so the verification can fail in nearly the same place, repeatedly. Riddle me this: what is more likely? A memory failure, a drive failure, or a postgresql bug that results in a corrupted row? Badblocks checks out on the server’s disks… if the data matters, it is extremely unpleasant going through that whole thing, it’s crystal clear after the fact but it’s a bloody nightmare in the heat of it all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: