Are SSDs more reliable than hard drives? (2021)

glimshe · 2025-02-19T11:14:37 1739963677

An equally interesting question would be... "Are the SSDs you are likely to buy more reliable than the HDDs you are likely to buy" ?

nine_k · 2025-02-19T12:58:02 1739969882

Even more generally, plot the reliability of HDD and SSD along the price axis. Plot such charts for a few widespread sizes (say, 256 GiB to 8 TiB). For bonus points, plot SLC, DLC, QLC flash drives separately, and normal / shingled HDDs separately.

Alternatively, plot price per GB for given levels of reliability. I suspect that this chart is going to have distinct sweet spots.

snailmailstare · 2025-02-19T13:00:22 1739970022

That's a lot of required purchases of the wrong hardware for the business that the company providing the statistics is in.

do_not_redeem · 2025-02-19T13:48:28 1739972908

Yep. I'm confident the large cloud providers (Google, Amazon, etc) run enough SSDs that it's worth them having this data internally, but there's probably no motivation to share it, or maybe they see having the data as some sort of competitive advantage.

homebrewer · 2025-02-19T13:56:57 1739973417

FWIW, Josef Bacik, one of the main developers of btrfs and a long-time Facebook employee, said many times that most Facebook servers use the crappiest SSDs they can buy.

I won't be delving into hour-long searches through the mailing lists to support a casual comment, but here's something I could dig out quickly:

https://lwn.net/Articles/824855/

braiamp · 2025-02-19T14:10:53 1739974253

> I won't be delving into hour-long searches through the mailing lists to support a casual comment

Don't worry fam, I got you.

https://lwn.net/ml/fedora-devel/03fbbb9a-7e74-fc49-c663-3272...

Source via: https://news.ycombinator.com/item?id=30443761

devsda · 2025-02-19T14:40:02 1739976002

Not sure what the take away is from their experience.

Is it

1. Consumer grade/crappy SSDs are still reliable enough at scale and rarely fail that it is easier to manage occasional failures ?

2. The cost of engineering effort to keep things together with crappy ssds failing often is less compared to the expensive enterprise grade hardware ?

Or is it something else ?

pjc50 · 2025-02-19T15:04:03 1739977443

Once you build a distributed system that's large, it has to tolerate routine node failures.

Once you have a fault tolerant system, and some idea of failure rates, you can then ask "is it worth having X units of less reliable hardware or Y units of more reliable hardware? How many will still be running after a year?"

You then find out that buying a few more cheap nodes compensates completely for the lower reliability.

nine_k · 2025-02-19T15:49:53 1739980193

One more step, and it will conjoin with Erlang's "Let it crash" approach.

snailmailstare · 2025-02-19T17:41:57 1739986917

This thread seems overly complex given that an engineer tasked with designing a RAID storage to replace an unplanned haberdashery would probably start with looking up what the requested acronym stands for.

ahartmetz · 2025-02-19T17:12:22 1739985142

> use the crappiest SSDs they can buy

That might be a bad idea, from personal experience. I bought a cheap SSD from a reputable vendor (Crucial) to upgrade someone's old laptop. Sometimes (10-20% of the time?), the laptop boots about as slowly (or worse) as with a mechanical HDD. The SSD is a model entirely without RAM cache, which seems to cause some really bad performance cliffs.

nine_k · 2025-02-19T18:41:09 1739990469

This is because you're missing the key tenet: massive parallelism. This works when you have 1000 drives, with significant data redundancy distributed among them. If two of them fail, and one of them is so slow that it can also be considered failed, it's not a big deal, you still have 997 adequately performing drives, and the system can easily handle the loss of 0.3% of the capacity.

If you only have one drive, try to make it as good as you can afford.

snailmailstare · 2025-02-20T11:51:10 1740052270

RAID expects non deterministic bad behaviors. A group of these drives receiving the same pattern of block accesses could exhibit the same behavior at the same time depending on how they were designed.. Similarly, I think it is usually done with the expectation of something worth treating as an irreversible failure, some RAIDs may perform at their worst for read when there is a slow response from a drive that isn't known to be bad, optimizing them to compete by always racing the correction with the last response would be inefficient if they weren't designed specifically for this defect.

throwaway48476 · 2025-02-19T13:57:47 1739973467

Idle spinning hard drives consumes multiple times that of idle SSDs. Over the life of the drive this becomes significant as well.

justinclift · 2025-02-20T04:05:31 1740024331

> idle SSDs

Bearing in mind that SSDs used in data centres (at least used to) not have much in the way of low power idlying mode like consumer grade SSDs do (due to laptop requirements).

throwaway48476 · 2025-02-20T11:20:08 1740050408

Data center SSDs quote idle power much lower than active power, and lower than HDD too. Data centers care plenty about power consumption.

justinclift · 2025-02-20T12:47:41 1740055661

Good. The stuff I still deal with (older generation SAS2/3 gear) isn't that way. ;)

crabbone · 2025-02-19T13:48:18 1739972898

Here, adding more axes to consider:

In enterprise settings, what you are likely to be interested in is how easy it is to recover a RAID with a failing member. Now, here's an interesting part: the larger the disk participating in the RAID, the more is recovery susceptible to failures of the disk controller. It's possible that the failure rate on the controller is so high that whenever you are trying to recover the data, you will have a very high (>50%) chance of introducing an error... and so a RAID virtually becomes unrecoverable. SATA controllers are in that later category (and that's partially why enterprise users preferred SAS before NVMe + PCIe became a thing).

xnx · 2025-02-19T12:42:33 1739968953

For your usage patterns

Snoozus · 2025-02-19T11:31:55 1739964715

This one is from 2023 https://www.backblaze.com/blog/ssd-edition-2023-mid-year-dri...

CrossVR · 2025-02-19T11:44:21 1739965461

There is an important difference in failure modes between HDDs and SSDs for consumers though. An SSD that fails is likely to go into a read-only mode that still allows you to recover your data. The only way to recover your data from a HDD failure is to find a data recovery business.

However for a business like blackblaze that specializes in data storage, recovering data from an HDD is probably more reliable. After all, in the case of a complete failure recovering data from the platters inside the HDD is more likely to succeed than recovering data from the memory chips inside the SSD.

edwcross · 2025-02-19T12:36:48 1739968608

Just an anecdote, but 10 years following about 200-300 laptops, with about 90% of them using SSDs, I only had 2 drive failures, both with SSDs: one of them became completely unusable (all data lost and impossible to recover anything, even after paying a specialized company to retrieve the data), and the other cost 2k$ to get the data back - and the company struggled with it for a few weeks, after which the data became less useful.

Apparently the company was better equipped to handle HDDs, and would have probably gotten the data easily if that were the case. But the experience made me very dubious of SSD reliability.

In both cases, users were using their laptops just fine, and out of the blue, the laptops froze or had some erratic behavior, and the next time they booted, the drives were already unusable.

On the other hand, when I had an HDD failure (due to electrical power fluctuation), except for some bad sectors where the needle hit the disk, it remained usable for years.

So I agree with you, their failure modes are important. Of course, nowadays most laptops only have M.2 slots anyway, so HDDs are completely out of the question.

nine_k · 2025-02-19T13:01:27 1739970087

> cost 2k$ to get the data back

Something about regular backups of valuable data should be said here.

BenjiWiebe · 2025-02-19T16:25:52 1739982352

Hindsight is 20/20. They're probably doing better with backups now, and they certainly understood how beneficial backups are, when they were dealing with the data loss.

threeducks · 2025-02-19T12:39:23 1739968763

I've had the opposite experience. An SSD died without any warning resulting in complete data loss, while a dying HDD started making funny noises, which prompted the thought "Hmm, might be time for a backup again".

paulnpace · 2025-02-19T13:37:18 1739972238

I've had two types of SSD failures, one with super cheap drives that I knew would fail early and gave lots of warnings in the form of unpredictability before failure, and one of the more expensive SanDisk SSDs that failed without warning catastrophically one month after the 3-year warranty (which wouldn't have covered data, anyway).

With the cheap ones I knew to be paranoid, with the expensive one I lost a few things.

Every single HDD failure I've experienced has had so many warnings long before total failure that I do feel more comfortable with them, especially for unattended backup utilities where I don't notice how slow they are. I've never tried, but have wondered if platters can be transferred or PCBs replaced at least to transfer out the data.

BenjiWiebe · 2025-02-19T16:26:55 1739982415

Replacement PCBs is a method of data recovery used for HDDs. There's some videos of it on YouTube.

scraptor · 2025-02-19T12:09:29 1739966969

That's the promise. In practice your SSD might kill itself due to a bug in the firmware and drop off the bus without warning long before the flash is worn out.

reginald78 · 2025-02-19T13:20:38 1739971238

I can't even remember reading any reports of read only failure due to flash exhaustion. I do remember boat loads of early SSDs, mostly sandforce based ones, spontaneously dying due to firmware bugs.

nine_k · 2025-02-19T13:05:28 1739970328

Electronic parts can fail silently and without warning in either case, e.g. due to a tin whisker producing a short somewhere.

Mechanical parts being less reliable than electronic parts in HDDs, and their relatively slow degradation, allow for an earlier warning when data can still be recovered but the drive should rather be replaced.

ritcgab · 2025-02-19T20:11:14 1739995874

HDDs also have controller firmware and it's not unseen that they kill hard drives early.

Beltiras · 2025-02-19T11:58:35 1739966315

Backblaze can lose 3/20 hard drives and still recover the data from parity drives. They don't have to rely on recovering it from non-operative drives unless they become unbelievably unlucky.

xela79 · 2025-02-19T14:55:13 1739976913

> An SSD that fails is likely to go into a read-only mode that still allows you to recover your data. The only way to recover your data from a HDD failure is to find a data recovery business.

SSD failure in my experience both professional and private is definitely what you write there. SSD failure just make the system no longer recognize the drive. No read only mode to speak of

where as a failing hard drive has a bit more survival luck, also the data recovery at a very high success rate when going to a third party

llm_nerd · 2025-02-19T12:26:53 1739968013

SSDs might go to read only mode when they have block exhaustion, but my experience in practice is that they simply stop functioning entirely when they fail, whether the endurance is still at 100%, 80% or 5%. You could put the NAND on a different controller, but that's a very involved, difficult process, and even that isn't guaranteed to do anything.

Their unreliability has made me far more committed to ensuring that anything of value is backed up incrementally 24/7.

Panzer04 · 2025-02-19T11:58:19 1739966299

Sounds irrelevant in both cases. A business will use real redundancy with extra drives rather than hoping to maybe recover data from a dead drive. If the drive dies they probably just pull it and rebuild, which is likely quite a lot better (from a technical perspective, excluding storage space) on an SSD (much higher write rates restore the redundancy faster).

Electronic failures are technically possible, but tbh I think that's so unlikely nowadays it's not really worth considering. The amount of electronics in everyday life and almost none of them ever break for reasons related to the elctronics themselves.

bestouff · 2025-02-19T12:44:28 1739969068

Electronics don't fail because it's all so brand new. And this is due to programmed obsolescence or whatever you call it where software will force you to throw it while the hardware is still perfectly functional.

Panzer04 · 2025-02-19T20:18:24 1739996304

Sure, stuff goes out of date, but rarely does it fail for hardware reasons. Practically the only electronic components that fail with any regularity are chemical components like electrolytic caps and batteries.

dns_snek · 2025-02-20T10:25:49 1740047149

> An SSD that fails is likely to go into a read-only mode that still allows you to recover your data.

Anecdotally, every (n=5) SSD that has failed on me suffered sudden* catastrophic failure that made the drive unreadable.

I forget the exact models, but one was a high end consumer Samsung NVMe, two higher end Samsung SATA SSDs, and a couple of M.2 SATA SSDs in laptops.

* One of the M.2 SATA SSDs was a slower failure where the kernel reported I/O errors at first, but it could still boot a few times, then it completely failed mid-backup and never mounted again.

prmoustache · 2025-02-19T13:27:34 1739971654

> An SSD that fails is likely to go into a read-only mode that still allows you to recover your data.

My anecdotal experience of 2 ssd drive failure was they were unreadable. If I remember correctly in both case the drive wasn't even listed by the operating system tools as if they were unplugged (can't remember about the bios/UEFI firmware).

Bottom line: do backups and test them regularly.

aranelsurion · 2025-02-19T15:54:08 1739980448

This lines up with mine as well. My HDDs fail more often than SSDs, but when they do it’s a painful slow death that shows itself in SMART and affords me days if not weeks. Out of 5 SSDs I had only one fail, and that one worked perfectly well a moment, next second my OS froze and it wouldn’t show up in BIOS anymore.

SirMaster · 2025-02-19T19:17:50 1739992670

Yep, all my SSDs that have failed so far have gone completely undetectable to all the computers I tried to use them on. I have never seen this read-only failure mode personally yet. It feels made-up to me.

somat · 2025-02-19T17:03:13 1739984593

backblaze(or any large storage provider) is never going to recover data from a failed storage drive. Their whole value proposition is that they are designed so they don't need to. In fact if I remember correctly, backblaze does not even try to replace failed drives, they wait until a rack unit ~ 60 drives degrades to given amount then replace the whole unit.

Fun fact, one of the surprising differences between consumer grade drives and enterprise grade drives is the firmware in the enterprise drives is designed to fail fast. The theory being a consumer drive is probably the sole unit and should probably give every effort to limp along, to give the user a chance to get the data off. While an enterprise drive is probably part of a redundant array and should die quickly so the rebuild process can start as soon as possible.

snailmailstare · 2025-02-19T11:50:56 1739965856

Wouldn't that failure mode be most equivalent to something like a bad blocks failure on a hard drive?

Both can have random ~controller electronics failures, and I think the SSD is often designed to be impossible to recover in that situation with a replacement controller or other electronics.

CrossVR · 2025-02-19T11:58:48 1739966328

You are entirely correct, however most consumers would take the loss and discard the data rather than replace a drive controller or salvage the platters. That's something they would need to hire an expensive data recovery service for.

If you're a business or a professional then certainly HDDs provide you with more recovery opportunities.

snailmailstare · 2025-02-19T12:18:24 1739967504

It's certainly interesting how failure modes play out for consumers given that they more often run drives to some failure..

While I wouldn't buy a HDD as a consumer, I think they are all around better from this angle. Many consumers actually did pay to have a thesis retrieved from HDDs when laptops were still using them and the partial media failure offering a data retrieval is basically the same bad option often presented on either.

(The article is out of date and I would suspect SSDs have improved at a faster rate than HDDs since, but some of that improvement will have been redirected to make even cheaper consumer options.)

hilbert42 · 2025-02-20T18:15:53 1740075353

The following quote is from a post by CrossVR but as my reply is both long and in parts deviates into a more general discussion I've avoided cluttering up that thread by posting it as a new comment.

"An SSD that fails is likely to go into a read-only mode that still allows you to recover your data."

Perhaps so, but from my experience that's often not what happens, most of my SSD failures have resulted in the drives being stone-dead, that is completely unresponsive and their data is irrecoverable. I'd like to see some comprehensive statistics on this.

There's much that can be said about the reliability of hard drives and SSDs (and other electronic storage media) as well as the role manufacturers have in making reliable storage. I'd argue that their collective actions aren't helpful and overall they have made present-day storage less reliable than it ought to be by virtue of the secretive and proprietary processes they've employed to manufacture their products.

First, it seems to me that all too often we do not bother to consider storage technologies in an holistic way; after all, the management, control and long-term integrity of our data ought to be our first and principal focus, so from the outset we ought analyse how effective current storage technologies are at achieving these goals. My contention is that when we appraise modern electronic storage with that objective in mind then we find that it falls very short of the ideal. If it were not for the fact that at present there is no existing technology that is more reliable and more suited for purpose then I'd suggest that all currently-used technologies are not fit for purpose.

Of course, one cannot justifiably make such a claim if better solutions do not exist, so let me explain why I've ventured forth with such a provocative comment. Before I do I'd add that any detailed analysis of the subject is worthy of a lengthy book, thus—with limited exceptions—involved technical discussion is essentially outside the scope of this post. Therefore, rather than spend time on the minutae of drive failures I'll instead focus on users' data and the essential requirement to protect its integrity for as long as is necessary. Specifically, that must be for as long as users deem said data useful and or relevant. In practice, that could be upwards of many decades, and in some instances data may have to be kept in perpetuity.

That brings me to major concerns I have with existing data storage systems, first is the technologies that both hard disks and SSDs employ are essentially ephemeral in that they are intrinsically fragile. The consequences are that the level of overhead necessary to protect and maintain data integrity over its required lifetime is high; second, in the neverending quest to increase data densities, the developedment of both hard drives and SSDs is continuously being pushed to the limit. Combined, they not only contribute to overall system fragility but also to reduced lifecycles, they increase turnover of hardware and increase maintenance costs.

Moreover, even if hardware does not fail within its nominal replacement schedule, its lifecycle is still of extremely short duration when compared with traditional storage. Again, ongoing upgrades and hardware replacements put a considerable burden on both professional and individual users. Moreover, maintenance and its concomitant costs must be continued throughout the expected lifetime of the stored data, otherwise data integrity would suffer.

Nevertheless, for the most part, professional users—data centers, well-managed server sites, etc.—end up better off than individual users in that (a) they have strict and well-structured backup procedures that ensure the integrity and longevity of users' data, and (b) their infrastructure is such that it's easier for them to follow best practice

On the other hand, users who manage their own data have to take full responsibility for both its integrity and longevity which is a considerable challenge given that many are unaware of the limitations of the technologies they are forced to use. Thus, it's not surprising this group often experiences difficulties in maintaining the integrity of their data across its deemed lifetime.

Such are the limitations of modern electronic storage systems. In the grand scheme of things, whilst present-day technologies offer many advantages over older more traditional storage systems like print and paper documents they nevertheless exhibit some serious drawbacks and disadvantages. Irrespective or whether they're used to store analog or digital data, and unlike their older counterparts, modern storage and retrieval systems cannot offer users set-and-forget long-term data storage that will remain reliable and viable over many decades or even centuries because technologies necessary to support this level of reliability simply do not yet exist in any practical form. (Yes, I accept some low-density long-term storage systems do exist but for almost all modern applications they're impractical to use.)

As mentioned, all common 'electronic' storage technologies in current use are, by nature, essentially ephemeral. Restated, the information stored in modern data storage and retrieval systems has to be refreshed and regenerated at regular intervals to remain viable. Moreover, in comparison with traditional information storage, modern electronic systems require the interval between data refresh cycles to be very short indeed. For instance, there are many paper-based documents that are many centuries old and the information they contain is still fully viable whereas with digital systems refresh intervals can be as short as the life of a hard disk or SSD or even less—that is as short as several years, five or so at most.

Holistic considerations require one to again draw a comparison between the longevity of traditional information and the ephemeral nature modern storage technologies, so despite my above point about keeping technical details out of the discussion, I have to make brief mention of the limitations of these technologies to justify my assertions.

Going on evidence it's safe to say there's no 100% guarantee (in fact it's unlikely) that data stored in any electronic storage media that's in common usage nowadays will be able to be read in say 20—30 years from now let alone in 50 or 100 years. Simply, we have no electronic technology nor any foolproof system that can guarantee that data can be stored for many decades and still be read faithfully. That this story and these posts are concerned with the short and inadequate service life of hard disks and SSDs only highlights the point.

Why is this so? Well, let's briefly look at various storage technologies currently in use and consider why none is even satisfactory let alone ideal. First, consider cloud storage. To commit one's data to the Cloud means that one must have complete faith in the vendors that offer such services. A quick examination shows that none have even reasonable form when it comes to how they treat customers, ipso facto, same goes for their data. Expecting entities such as Google, AWS and Microsoft to still be around and remain in their current corporate form in 20, 30 or so years let alone 50 or more is just fanciful. Cloud storage thus requires users to be vigilant and to constantly monitor how their data is being managed on a regular basis. For some this won't be possible.

This begs the question about the long-term management of say important historical data and such that doesn't have a specific owner or custodian to look after it on a regular basis. As sure as eggs, even if these entities last the distance—which is doubtful—they're very unlikely going to protect this 'orphaned' data as a mother would protect a child. One only has to look at the ruthless tactics that Big Tech has already adopted to see that one's data isn't always secure. Then there's DRM, there's not even any short-term guarantee that data one's actually paid for won't be summarily deleted at a whim.

Now consider the tech itself. Hard disks have a magnetic remanence problem where magnetic intensity drops by a significant percentage per year, thus data stored on them has a definite lifespan. Even if drives are properly archived and only used for data retrieval data rot will eventually claim its data. As stated, to avoid this data must be refreshed periodically, even then there are no guarantees that this will occur in a timely manner across the life of the data.

Similarly with SSDs. The way SSDs store data, is at best, can only be described as unreliable and fraught with risk. In my opinion, storing electric charges in 'glass'-like medium/substrates and expecting them to remain there indefinitely is like believing in fairyland. As it is, data on SSDs has to be refreshed periodically to ensure it doesn't altogether disappear. By nature, electronic data is already ephemeral, so storing it on SSDs then forgetting about it has to be the riskiest of risky procedures. It's almost equivalent to erasing one's data.

In summary, with current technologies, unless great care is taken to manage users' data and to ensure it's refreshed at proper intervals then it will definitely atrophy over time. This is the perennial problem with the current tech.

Is there rhe possibility that we'll get almost bullet-proof storage technologies? Yes, there definitely is and there are some good contenders, but don't hold your breath, there's no indication that they'll be around anytime soon. That said, I'm not even going to mention them here as that'd likely double the length of this post.

Xen9 · 2025-02-21T03:25:18 1740108318

The problem is no proper theory of backups exists: https://news.ycombinator.com/item?id=41173227

hilbert42 · 2025-02-22T14:48:48 1740235728

Unfortunately, I also have to agree with that fact.

fmajid · 2025-02-19T13:01:08 1739970068

It's a largely academic question. HDDs simply don't have the performance required for modern use, and NAND flash SSDs are not archival if left unpowered, so it's SSDs for all online storage and HDDs for backups.

You do have to take precautions: avoid QLC SSDs, and SMR hard drives.

cshores · 2025-02-19T13:06:35 1739970395

HDDs are the backbone of my homelab since storage capacity is my top priority. With performance already constrained by gigabit Ethernet and WiFi, high-speed drives aren’t essential. HDDs can easily stream 8K video with bandwidth to spare while also handling tasks like running Elasticsearch without issue. In my opinion, HDDs are vastly underrated.

shikharbhardwaj · 2025-02-19T13:16:00 1739970960

I run a hybrid setup which has worked well for me: HDDs in the NAS for high-capacity, decent-speed persistent storage with ZFS for redundancy, low-capacity SSDs in the VM/container hosts for speed and reliability.

cshores · 2025-02-19T18:10:53 1739988653

Same, I run my containers and VMs off of 1TB of internal SSD storage within a Proxmox mini PC(with an additional 512gb internal SSD for booting Proxmox). Booting VMs off of SSD super quick so its the best of both worlds really.

fmajid · 2025-02-19T13:22:27 1739971347

Yes, those workloads are mostly sequential I/O, that HDDs can still handle. Most of my usage is heavily parallel random I/O like software development and compiles.

You also have the option of using ZFS with SSDs as L2ARC read-cache and ZIL write-cache to get potentially the best of both worlds, as long as your disk access patterns yield a decent cache hit rate.

cshores · 2025-02-19T16:49:35 1739983775

I do something similar as well for my primary storage pool appliance of 28TB available. It has 32GB of system ram so I push push as much in to ARC Cache as possible without the whole thing toppling over; roughly 85%. I only need it for an NFS end point. It's pretty zippy for frequently accessed files.

k_bx · 2025-02-19T13:11:51 1739970711

I need big drives for backup. Clearly, there's even more reasons to use HDDs now.

mbreese · 2025-02-19T13:47:52 1739972872

Even in this case, you need to be careful with how you use HDDs. I say this only because you mentioned size. If you’re using big drives in a RAID setup, you’ll want to consider how long it takes to replace a failed drive. With large drives, it can take quite a long time to recover an array with a failed drive. Simply because copying 12+TB of data to an even a hot spare takes time.

Yes there are ways to mitigate this, particularly with ZFS DRAID, but it’s still a concern that’s more a large HDD thing. For raw storage, HDDs aren’t going anywhere anytime soon. But, there are still some barriers with efficient usage with very large drives.

gpapilion · 2025-02-19T12:42:14 1739968934

https://www.backblaze.com/blog/ssd-drive-stats-mid-2022-revi...

They reach the conclusion here they are more reliable.

shikharbhardwaj · 2025-02-19T11:46:13 1739965573

My experience is anecdotal, but as a general consumer using these drives in a home environment, HDDs simply have not been kind to me over the years.

On the other hand, I just had my first SSD (Samsung 850 Evo) fail on me yesterday, after chugging along for about 7 years. Compared to numerous hard drive failures over the years, to the point that I've made a rule to keep my workstation hard drive free for most of the last decade.

bestouff · 2025-02-19T12:41:18 1739968878

Out of curiosity could you describe the failure mode ?

shikharbhardwaj · 2025-02-19T12:50:56 1739969456

It started out as intermittent I/O errors, where the system would boot fine and run for a few hours and then lock up to the point where I could not even SSH into it. I tried replacing the cables and connecting to a different power cable but the problem remained.

Within a couple of days, the drive stopped being detected. I tried one last time using an external USB3-to-SATA enclosure but got the same outcome. I'm guessing that the controller gave out, as the wear level stats were fine last time I checked (a couple of weeks ago).

This was being used 24x7 for the last few years as a boot drive for my Proxmox server, for which I run nightly backups so I was back up and running within an hour or so after replacing the drive.

tosh · 2025-02-19T11:08:56 1739963336

(from 2021)

> the difference was certainly not enough by itself to justify the extra cost of purchasing a SSD versus a HDD

Unlisted6446 · 2025-02-19T12:09:12 1739966952

I don't understand why they aren't trying to use multiple linear regression to control for the effects of how old SSD vs HDD are, or to use something like survival analysis? I thought this was a largely solved problem...

PaulKeeble · 2025-02-19T13:46:22 1739972782

I have just one of the early SSDs still functional, the original Intel 80GB SSD, its too small for anything now so its retired with about 60% of its life remaining. Everything else of that era of early SSDs died.

The oldest SSD I have still in use is an Crucial M4-CT512 (https://www.crucial.com/support/ssd-support/m4-25-inch-suppo...) and that was likely bought in 2013. It still works OK but its always had an issue with some writes taking a very long time. The younger ones like the Samsung 970 Pros still work as new as do the cheap chinese PCI-E 4.0 drives.

I think the early SSDs weren't reliable at all but most of the recent stuff seems to be a lot better than the HDDs in the NAS which have a failure or so every 3 years.

alias_neo · 2025-02-19T13:53:35 1739973215

Anecdotally, I have a similar experience in my NAS/Server.

Unlikely Backblaze, I don't write much to my drives, they mostly get some writes of relatively small backups once or twice a day and that's it. I keep all of my VM boot drives on NVMe SSDs.

I've only had one, very early SSD (SATA, OCZ branded) fail on me since the beginning, and I run significantly more SSDs than I do HDDs, in my desktop, laptops, servers etc.

In the NAS/Server (It's a high-spec Proxmox server doubling as a NAS) I have had a bunch of HDD failures, WDReds (CMR) and Seagate IronWolf drives, typically around 4-5 year mark.

For use cases like mine, where they spend most of their time idling, I think SSDs are a clear winner, HDDs are wearing out just by idling.

My next failure I'll be replacing the mirror that fails with SSDs.

popol12 · 2025-02-19T10:59:21 1739962761

Article from 2021 with about 3 years of deployment data.

kittikitti · 2025-02-19T14:06:40 1739974000

The first time I bought a SSD it failed in about 3 months. It was a Samsung SSD and I would just be careful with manufacturer claims. I might have been selectively scammed since I didn't have a large enterprise behind me to complain.

asdefghyk · 2025-02-21T09:33:48 1740130428

Figures I heard from Data recovery company - 80% of hard disk they receive all data can be recovered. 30% of SSD they receive all data recovered

theragra · 2025-02-19T19:32:38 1739993558

I haven't had ssd failure at all. But at least three HDDs failed, maybe four.

hilbert42 · 2025-02-20T18:17:06 1740075426

The following quote is from a post by CrossVR but as my reply is both long and in parts deviates into a more general discussion I've avoided cluttering up that thread by posting it as a new comment.

"An SSD that fails is likely to go into a read-only mode that still allows you to recover your data."

Perhaps so, but from my experience that's often not what happens, most of my SSD failures have resulted in the drives being stone-dead, that is completely unresponsive and their data irrecoverable. I'd like to see some comprehensive statistics on this.

There's much that can be said about the reliability of hard drives and SSDs (and other electronic storage media) as well as the role manufacturers have in making reliable storage. I'd argue that their collective actions aren't helpful and overall they have made present-day storage less reliable than it ought to be by virtue of the secretive and proprietary processes they've employed to manufacture their products.

First, it seems to me that all too often we do not bother to consider storage technologies in an holistic way; after all, the management, control and long-term integrity of our data ought to be our first and principal focus, so from the outset we ought analyse how effective current storage technologies are at achieving these goals. My contention is that when we appraise modern electronic storage with that objective in mind then we find that it falls very short of the ideal. If it were not for the fact that at present there is no existing technology that is more reliable and more suited for purpose then I'd suggest that all currently-used technologies are not fit for purpose.

Of course, one cannot justifiably make such a claim if better solutions do not exist, so let me explain why I've ventured forth with such a provocative comment. Before I do I'd add that any detailed analysis of the subject is worthy of a lengthy book, thus—with limited exceptions—involved technical discussion is essentially outside the scope of this post. Therefore, rather than spend time on the minutae of drive failures I'll instead focus on users' data and the essential requirement to protect its integrity for as long as is necessary. Specifically, that must be for as long as users deem said data useful and or relevant. In practice, that could be upwards of many decades, and in some instances data may have to be kept in perpetuity.

That brings me to major concerns I have with existing data storage systems, first is the technologies that both hard disks and SSDs employ are essentially ephemeral in that they are intrinsically fragile. The consequences are that the level of overhead necessary to protect and maintain data integrity over its required lifetime is high; second, in the neverending quest to increase data densities, the developedment of both hard drives and SSDs is continuously being pushed to the limit. Combined, they not only contribute to overall system fragility but also to reduced lifecycles, they increase turnover of hardware and increase maintenance costs.

Moreover, even if hardware does not fail within its nominal replacement schedule, its lifecycle is still of extremely short duration when compared with traditional storage. Again, ongoing upgrades and hardware replacements put a considerable burden on both professional and individual users. Moreover, maintenance and its concomitant costs must be continued throughout the expected lifetime of the stored data, otherwise data integrity would suffer.

Nevertheless, for the most part, professional users—data centers, well-managed server sites, etc.—end up better off than individual users in that (a) they have strict and well-structured backup procedures that ensure the integrity and longevity of users' data, and (b) their infrastructure is such that it's easier for them to follow best practice

On the other hand, users who manage their own data have to take full responsibility for both its integrity and longevity which is a considerable challenge given that many are unaware of the limitations of the technologies they are forced to use. Thus, it's not surprising this group often experiences difficulties in maintaining the integrity of their data across its deemed lifetime.

Such are the limitations of modern electronic storage systems. In the grand scheme of things, whilst present-day technologies offer many advantages over older more traditional storage systems like print and paper documents they nevertheless exhibit some serious drawbacks and disadvantages. Irrespective or whether they're used to store analog or digital data, and unlike their older counterparts, modern storage and retrieval systems cannot offer users set-and-forget long-term data storage that will remain reliable and viable over many decades or even centuries because (as already stated) the technologies necessary to support this level of reliability simply do not yet exist in any practical form. (Yes, I accept some low-density long-term storage systems do exist but for almost all modern applications they're impractical to use.)

As mentioned, all common 'electronic' storage technologies in current use are, by nature, essentially ephemeral. Restated, the information stored in modern data storage and retrieval systems has to be refreshed and regenerated at regular intervals to remain viable. Moreover, in comparison with traditional information storage, modern electronic systems require the interval between data refresh cycles to be very short indeed. For instance, there are many paper-based documents that are many centuries old and the information they contain is still fully viable whereas with digital systems refresh intervals can be as short as the life of a hard disk or SSD or even less—that is as short as several years, five or so at most.

Holistic considerations require one to again draw a comparison between the longevity of traditional information and the ephemeral nature modern storage technologies, so despite my above point about keeping technical details out of the discussion, I have to make brief mention of the limitations of these technologies to justify my assertions.

Going on evidence it's safe to say there's no 100% guarantee (in fact it's unlikely) that data stored in any electronic storage media that's in common usage nowadays will be able to be read in say 20—30 years from now let alone in 50 or 100 years. Simply, we have no electronic technology nor any foolproof system that can guarantee that data can be stored for many decades and still be read faithfully. That this story and these posts are concerned with the short and inadequate service life of hard disks and SSDs only highlights the point.

Why is this so? Well, let's briefly look at various storage technologies currently in use and consider why none is even satisfactory let alone ideal. First, consider cloud storage. To commit one's data to the Cloud means that one must have complete faith in the vendors that offer such services. A quick examination shows that none have even reasonable form when it comes to how they treat customers, ipso facto, same goes for their data. Expecting entities such as Google, AWS and Microsoft to still be around and remain in their current corporate form in 20, 30 or so years let alone 50 or more is just fanciful. Cloud storage thus requires users to be vigilant and to constantly monitor how their data is being managed on a regular basis. For some users this won't be possible.

This begs the question about the long-term management of say important historical data and such that doesn't have a specific owner or custodian to look after it on a regular basis. As sure as eggs, even if these entities last the distance—which is doubtful—they're very unlikely going to protect this 'orphaned' data as a mother would protect a child. One only has to look at the ruthless tactics that Big Tech has already adopted to see that one's data isn't always secure. Then there's DRM, there's not even any short-term guarantee that data one's actually paid for won't be summarily deleted at a whim.

Now consider the tech itself. Hard disks have a magnetic remanence problem where magnetic intensity drops by a significant percentage per year, thus data stored on them has a definite lifespan. Even if drives are properly archived and only used for data retrieval data rot will eventually claim its data. As stated, to avoid this problem data must be refreshed periodically, even then there are no guarantees that this will occur in a timely manner across the life of the data.

Similarly with SSDs. The way SSDs store data can, at best, only be described as unreliable and fraught with risk. In my opinion, storing electric charges in 'glass'-like medium/substrates and expecting them to remain there indefinitely is like believing in fairyland. As it is, data on SSDs must be refreshed periodically to ensure it doesn't altogether disappear. By nature, electronic data is already ephemeral, so storing it on SSDs then forgetting about it has to be the riskiest of risky procedures. It's almost equivalent to erasing one's data.

In summary, with current technologies, unless great care is taken to manage users' data and to ensure it's refreshed at proper intervals then it will definitely atrophy over time. This is the perennial problem with current storage tech.

Is there the possibility that we'll get almost bullet-proof storage technologies that last indefinitely? Yes, there definitely is and there are some good contenders, but don't hold your breath, there's no indication that they'll be around anytime soon. That said, I'm not even going to mention them here as that'd likely double the length of this post.