Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> One of the prime prerequisites of e.g. zfs actually is using ECC RAM.

ECC RAM is not a prerequisite for using ZFS.

Matt Ahrens, co-creator of ZFS and still one of the main developers, said this [1]:

"There's nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem. If you use UFS, EXT, NTFS, btrfs, etc without ECC RAM, you are just as much at risk as if you used ZFS without ECC RAM. Actually, ZFS can mitigate this risk to some degree if you enable the unsupported ZFS_DEBUG_MODIFY flag (zfs_flags=0x10). This will checksum the data while at rest in memory, and verify it before writing to disk, thus reducing the window of vulnerability from a memory error.

I would simply say: if you love your data, use ECC RAM. Additionally, use a filesystem that checksums your data, such as ZFS."

[1] https://arstechnica.com/civis/threads/ars-walkthrough-using-...



True, I phrased it too strongly. Let's put it this way: zfs (or btrfs, ...) can make for a bullet-proof system through checksumming and self-healing. However, its Achilles heel is non-ECC RAM. With it, at least in theory and excluding universal disaster, data can live perpetually and remain intact indefinitely. Without ECC, zfs remains at the mercy of what the RAM might get wrong. That's what I remember learning about it a while ago.


Yes, that's true.

However, even if you're not using ECC RAM, you're much better off using ZFS or btrfs, because due to their frequent checksumming and checksum validations, these filesystems will usually detect memory corruption much sooner than if you didn't use them.

This could be immensely helpful in scenarios such as the great^4-parent poster.

Note that bad hardware is not limited to non-ECC RAM. ZFS and btrfs help just as much in detecting other kinds of bad hardware, such as bad SATA cables, bad disks, bad disk/SATA controllers, bad CPUs, bad power supply, etc.

But of course, once these checksum errors are flagged by ZFS/btrfs, it's a signal to test and fix/replace your hardware, not keep using a machine with bad hardware.

And yes, while ZFS and btrfs cannot fix errors that happen before the checksumming takes place (e.g. due to bad RAM, bad CPUs, etc), they can still detect these kinds of errors (in some cases, at least), especially when they happen after checksumming already took place. And they definitely can detect and fix errors in the rest of the data-to-storage-and-back path (e.g. bad disk cables, bad disks or disk controllers, etc).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: