Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What prevents, or tries to prevent, bit rot in standard CPU hard drives/flash drives? Is there some built-in forward error correction at some layer? If not, how could one add some?


Most SSDs only guarantee your data for 90 days when off. Even high grade, SLC, professional SSDs. Hard drives fare better, but your best bet for decade-long conservation remains tape.


That 90 days figure applies only to SSDs that have worn out their entire rated write endurance, and only to enterprise/datacenter SSDs. For client/consumer SSDs, the standard for unpowered data retention at the end of life is one year (at 30°C, compared to 40°C for enterprise drives). That longer retention requirement is one contributing factor to consumer SSDs being rated with lower write endurance.

In practice, hardly anyone uses up the entire write endurance of their SSD, especially not if they're using it to store backups, archives, or other relatively static data. Flash memory that isn't worn out has much longer retention times.


Is anyone interested in amateur digital archiving just destined to losing it all in the long run?

You'd think there'd be a market for long life HDDs.

Edit: Yes, you could just replace old hardware with new hardware, but it creates so much waste...


That's where hopefully Microsoft project silica will make it permanent for all practical purposes digital data storage available.

It apparently uses a similar basic process as those mall kiosks that take a volumetric picture of your face and then etch it into a cube of glass?

But in this case, they just write the data to the side of a small piece of glass over the course of days and it's basically permanent for all eternity unless something breaks it.


Wait, what. An SSD won't hold your data for more than 90 days if it's powered off?


It will probably hold it much longer. However it's not guaranteed to do any better than that.


I don't think you'd have much luck getting anything back from that "guarantee"... so is it a guarantee?


Yes, all modern disks use error correcting. Wrong bit reads happen frequently in HDD and corrected all the time. Bits can't rot in HDD or SSD outside of firmware bugs or transmission errors. You'll get sector read error instead of wrong data.


> Bits can't rot in HDD or SSD outside of firmware bugs or transmission errors

The entire photography community begs to differ.

Gradual degradation of stored image data across HDDs of all vendors is very well documented, mainly because the bit rot is very easy to spot when working with visual data.

Bit flips are persistent and they occur on the media itself, not in transmission. There are plenty of theories, including one that argues that bits are flipped by stray neutrinos, but nobody knows for sure.


> The entire photography community begs to differ.

Also ZFS. Part of its purpose and design goals was countering bit rot (as well as device hostility to keeping data safe), as Sun had customers affected (even with ECC). Hence the end-to-end checksumming amongst other features.

scrub exists pretty much just for bit rot: you run scrub regularly, it goes over the disk, checks that every block checksums properly, and if they don't[0] it repairs the data using the non-corrupted copy (assuming you have one).

[0] and your dataset is replicated, note that even if you don't use a raid configuration you can mark important dataset as to-duplicate for this purpose (this is not equivalent to device redundancy, it's a feature which exists solely for bitrot / corruption protection)


Ars Technica did a really good article [1] on bit rot. Although it's angled towards filesystems, it also provides good discussion on bit rot in general.

[1] https://arstechnica.com/information-technology/2014/01/bitro...

(edited to make more readable)


Can you link to any source giving details on this? I don't know of any mechanism that can protect individual bits from flipping without dedicating a serious amount of space to redundancy. And it doesn't align with my personal experience either: I had a bit flip in a Jpeg image this year. Fortunately I was able to restore it from backup.


It's pretty common knowledge. You don't need a serious amount of space, for example ECC RAM uses one parity bit for 8 bits to fix one bit flip or report 2 bit flips. Also disks use checksums which require even less storage.

Your situation is likely happened because of bad RAM.


> Your situation is likely happened because of bad RAM.

I don't see how that could have happened, since the image wasn't ever written to. It should have been the same file on disk as the backup, since I hadn't touched it since backing up the file, but it wasn't.

> You don't need a serious amount of space, for example ECC RAM uses one parity bit for 8 bits to fix one bit flip or report 2 bit flips

The thing about parity is that for it to be useful in a data recovery scheme, you have to know what bit got flipped. That works for RAID of course, because you usually know which disk is bad and so when you replace it you can work out the missing bits using the parity disk, but I don't think it could work for an individual hard drive, since in general you don't know which bit flipped.

And the thing about checksums is that they can tell you if your data has become corrupt, but they can't be used to fix things behind the scenes.


Turbo codes don't add a lot of overhead if you just want to protect against 1 bit flip per byte or so via FEC (forward error-correction), they're close to the Shannon information entropy limit in efficiency but I don't know of any open source implementations, and implementation details such as "I want to protect against at most 2 bits per byte" or "I want to protect at most 5 bytes per kilobyte" directly matter in the implementation.

I've been wanting to try to understand them to see if I could implement something myself for a while now


Turbo codes are typically soft decision codes, which don't make a ton of sense at the filesystem level, since there has already been a hard decision made. They are useful in storage at the read channel level, as in processing the analog stream from the head on the hard drive.

Reed-Solomon is often used in storage because it is an optimal erasure code - e.g., I know this block is missing or corrupt, correct it.


> Reed-Solomon is often used in storage because it is an optimal erasure code

I was initially thinking OP might be talking about something like this, but I think I would have heard if all new hard drives had it built into their firmware to store error correcting codes alongside the real data and automatically fix bit flips, since I suspect that would have pretty serious performance impacts in certain cases and people would be complaining.


>Bits can't rot in ... SSD

This is not correct. All forms of Flash memory lose data (charge in cells) over time. Even glass window EPROMs are only rated at tens of years. Every new generation of SSDs uses smaller Cells that lose charge faster. Wear and tear accelerates this. Samsung 840 with 300TB wear will lose data when unplugged for a week. https://techreport.com/review/25681/the-ssd-endurance-experi...

Its much worse in enterprise settings: https://www.extremetech.com/computing/205382-ssds-can-lose-d...


This is not true. I have encountered several times on systems with ECC memory where MD5 values for large archives change on magnetic discs. No read errors or SMART errors. I have a HDD (1TB Western Digital Green) in my desk drawer that does this after a couple weeks of cold storage.


Nothing.

There's par2 (and its offsprings MultiPar and QuickPar), there's rsbep, there's pyFileFixity.

All have their problems why you probably won't "shield" your files with them (I don't, although I worry).


Curious - what are your complaints on these programs? My main one is that they are slow to create archives and rsbep only does error decoding, not erasure. I'm working on an alternative tool that is much faster, but curious if you have any other feedback.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: