> One of the prime prerequisites of e.g. zfs actually is using ECC RAM. ECC RAM ...

diarrhea · on Jan 22, 2023

True, I phrased it too strongly. Let's put it this way: zfs (or btrfs, ...) can make for a bullet-proof system through checksumming and self-healing. However, its Achilles heel is non-ECC RAM. With it, at least in theory and excluding universal disaster, data can live perpetually and remain intact indefinitely. Without ECC, zfs remains at the mercy of what the RAM might get wrong. That's what I remember learning about it a while ago.

wizeman · on Jan 22, 2023

Yes, that's true.

However, even if you're not using ECC RAM, you're much better off using ZFS or btrfs, because due to their frequent checksumming and checksum validations, these filesystems will usually detect memory corruption much sooner than if you didn't use them.

This could be immensely helpful in scenarios such as the great^4-parent poster.

Note that bad hardware is not limited to non-ECC RAM. ZFS and btrfs help just as much in detecting other kinds of bad hardware, such as bad SATA cables, bad disks, bad disk/SATA controllers, bad CPUs, bad power supply, etc.

But of course, once these checksum errors are flagged by ZFS/btrfs, it's a signal to test and fix/replace your hardware, not keep using a machine with bad hardware.

And yes, while ZFS and btrfs cannot fix errors that happen before the checksumming takes place (e.g. due to bad RAM, bad CPUs, etc), they can still detect these kinds of errors (in some cases, at least), especially when they happen after checksumming already took place. And they definitely can detect and fix errors in the rest of the data-to-storage-and-back path (e.g. bad disk cables, bad disks or disk controllers, etc).