> I wouldn’t feel comfortable with RAID5/6 even on today’s ~10TB drives. I fully...

wing-_-nuts · on March 18, 2021

>But I strongly disagree with RAID being the problem. Rather the increasing of capacity with bandwidth not really improving is the issue. Of course this is a inherent limitation of disk drives.

I mean, yes, you could have ssds in raid and not have to worry about the drive failing before it's rebuilt, but we're talking about spinning disks here.

What's the current recommendations around raid capacity before you have to seriously start worrying about drive failure before it can be rebuilt? This is a genuine question I don't know.

sandworm101 · on March 18, 2021

>> What's the current recommendations around raid capacity before you have to seriously start worrying about drive failure before it can be rebuilt?

That depends on the size of your drives and the number in the array. If you are running a small NAS with only one parity drive, you don't want drives that take more than a day to onboard. That limits you to 4/6TB per drive. If you have two parity drives in the array you can probably risk the 8/10/12 TB size. If you are running commercial-scale arrays of dozen of drives, arrays that can handle multiple failures at once, then the sky is probably the limit.

But the story gets more complicated. Are all of your drives the same age? Are all the drives of the same model/manufacturer? Such things increase the risk ofone failure evolving into a multi-drive failure during rebuild. If/when I build a new array I want drives of different ages. So If I had 10 drives, I would start the array on five or six, holding some for later so that the entire array isn't the same age.

hinkley · on March 18, 2021

As long as you haven't hit the protocol bottleneck, aerial density almost always increases bandwidth. More bits per track equals more bits per revolution. If RPMs are constant then bits per second goes up as do bits per seek operation. It's the seeks that are murder.

wing-_-nuts · on March 18, 2021

Please ELI5, I know what each of these words mean in isolation but I know nowhere near enough about storage to understand what this means as a sentence.

hinkley · on March 18, 2021

Hard drive only have a handful of different rates of spinning (rpm). The diameter of the platters are also limited to a couple options. The number of platters varies a bit, but there are numbers that are too stupid-low and ones that are too high to fit into the case, so there are maybe 3-5 varieties?

If you have two drives with the same number of platters, the same diameter, and the same rotation speed, the one with the highest storage can only do that by increasing storage density - making the bits smaller. Smaller bits means more tracks, but also more bits per track. So in a single rotation (1/7200 of a second, for instance), the read head sweeps across more data. If it reads it as fast as it sees it (which they do), then that means a higher transfer rate.

effie · on March 18, 2021

Drive failure during a rebuild isn't a real concern unless

1) the rebuild takes time comparable to characteristic lifetime of the drive, which on average is >5years. Rebuild times are more in the realm of days, maybe coming to weeks with the proposed 100TB models. Still far from substantial probability of failure.

2) the drives are very old or have bad SMART data. Then the probability of drive failure and (array loss) shoots up.

What people often talk about regarding big drives and reliability concerns is probability of rebuild failure for RAID5, RAID6, i.e. some bit is read or written wrong and the array becomes inconsistent. This is much more probable but isn't really a big problem in practice because this can be detected and rebuild can be repeated.

effie · on March 19, 2021

OK the above isn't exactly correct in the last sentence. If there is URE due to medium error then we get a bad stripe and repeating rebuild will most often achieve nothing to remove it (although see SpinRite which can sometimes recover data from bad sectors). However, even if the bad sector can't be read, the "RAID failure" is still just loss of single or few sectors, which usually isn't a big deal. The array filesystem data is then best copied to a new array on new disks and then the old array discarded.

Robotbeat · on March 18, 2021

They are adding multiple actuators, which actually should help that fundamental limitation.

notacoward · on March 18, 2021

Not much, actually. Dual actuators only improve parallelism, not media transfer rate or latency. In practice, dual actuators don't even double external transfer rates due to occasional latency (which is unavoidable even for the best-case scenarios) and internal contention elsewhere in the drive. Even if you had four actuators working perfectly in concert, and weren't bottlenecked on the external interface speed (which you would be), a 4x improvement in transfer rate vs. a 8x increase in capacity would still mean a 2x increase in fill/empty time.

As with the shift from 5.25" to 3.5" to 2.5" and even 1.8" drives, the only way out of this bind is really to take advantage of the improved areal density to make drives that are the same capacity but smaller and pack more of those into the same volume or power/heat envelope. Drive manufacturers could help this along a little bit e.g. by sharing a motor and some environmental bits between what are otherwise completely separate drives (including separate external interfaces) within a single package, but mostly we'd all better get used to higher drive counts. Dual actuators - and this is far from the first time they've been tried BTW - are mostly a red herring.

Background: I worked on exactly these problems for the latter half of a thirty-year career, most relevantly at my last job working on an exabyte-scale storage system at a FAANG.

PaulHoule · on March 18, 2021

Also what market is there for big hard drives outside of FAANG and a few other places?

Retailers in my area (e.g. Best Buy) don't stock hard drives larger than 4 TB; I was going to tell somebody who lived in the Valley that he's lucky to be able to go to Fry's and then Fry's closed down.

Those same retailers stock both budget and quality SSD's up to 2TB in size. Most upgraders and the system builders are happy.

For the rest of us there is Amazon where the Seagate Exos "enterprise" drive costs half of what similar "consumer" drives cost, has a great reputation and does not seem hard to live with at home.

I would not take it for granted at all that a backup, RAID rebuild, restore, metadata scan or any full scan would work on such a disk if I hadn't tested it -- it is just that kind of technology. Synology is not crazy at all when they make you buy branded large drives to go in the enclosure.

Mediterraneo10 · on March 18, 2021

It is common for retailers to stock low-spec parts, but that doesn't mean that there is no market for the higher-end stuff. Another example is routers: hypermarkets often only stock 10/100MB routers, and a gigabit router has to be ordered online from somewhere. But of course there are plenty of people with gigabit fiber connections who do a lot of torrenting or have a whole family using the connection who would benefit from ordering a gigabit router.

FWIW, I filled an 8T hard drive and then a 4T follow-up hard drive in just one summer of torrenting. Full-size Blu-ray rips are large, and the canon of great cinema is vast. Sure, it'll take me many months afterwards to watch everything that I have downloaded, but these large hard-drive sizes are well within the bounds of what a cinephile with a home theater building a collection would need, and they definitely just aren’t for large businesses. Of course, the hard drive discussed in the linked article is something else entirely.

UI_at_80x24 · on March 18, 2021

>Also what market is there for big hard drives outside of FAANG and a few other places?

Video production & storage.

My company works on VR content. We currently have over 10PB stored remotely and have no less then 40TB of ACTIVE files in use every day.

These numbers grow by 175% monthly.

We are running out of rack space for our file-servers, just trying to KEEP UP with demand.

bcrl · on March 18, 2021

Time to start installing 100TB SSDs!

Robotbeat · on March 19, 2021

And multiply storage cost by 10x?

bcrl · on March 19, 2021

He did say that he was space constrained.

notacoward · on March 18, 2021

> what market is there for big hard drives outside of FAANG and a few other places?

I wouldn't consider myself an expert on market numbers, but I'd say you generally shouldn't be using the larger drives. As others have said, disk is the new tape. Unless you have the kind of system where you might once have used tape - i.e. one with a substantial ice-cold-data component - larger disks are likely to be an ill fit. Even where they are a good choice, that's mostly going to be in a tiered architecture with flash etc. to suck all the "heat" out of the data going into them.

Robotbeat · on March 18, 2021

Hard drives are way cheaper if you need a lot of data. And not just cold backup.

It’s nice that 2.5” portable USB hard drives are cheap enough now (and USB ports providing enough power) that you don’t need the bulky power supply any longer. That means if you want to do any kind of bulk video storage, you can afford to do it. You can’t with SSD, which costs 10x as much per TB. A factor of 10 still matters to most people, ESPECIALLY if they’re not FAANG.

And I think relying on FAANG infrastructure is over-rated. Still a very good case for local storage. USB now provides enough throughput and power that it’s easier than ever to have a significant amount of local storage for cheap. And carry it with you if you travel.

cbozeman · on March 18, 2021

And what exactly are people going to use?

I can get a used server on eBay with 96 TB of storage, usually in 12 x 8TB configs for around $2000 - $2500.

Flash drives? I'll spend 10 times that... hell, probably more. I love having my movies, music, television, etc. on my home server, but I don't love it at the level of 25 large. Now $2500? That's a lot more reasonable.

notacoward · on March 18, 2021

Sounds like you have a solution that works with today's medium-sized drives, so how is this even relevant to a discussion of 60-120TB drives? Do you think you'd be well served by putting that 96TB on one drive? I doubt it. If your active set (for any useful or relevant definition of "active") is X and the number of drives you need for IOPS or MB/s is Y, then the ceiling for how much you can effectively use per drive is X/Y. That's quite likely less than current drive capacities, and almost certainly within what can be achieved by current PMR technology. How would HAMR or BPM or any other more fundamentally new disk technology actually help you other than to store cold data on that "spare" capacity?

Robotbeat · on March 19, 2021

Yes? I mean, the sequential write speeds are fine for many video-heavy applications. You don’t want it in cold storage. Tape drives are not cheap. And neither is retrieving from tape terribly practical, either. There’s a large overhead for tape storage if you want it automatically accessible (ie a big tape jukebox). A big hard drive can store large volumes of data just a few milliseconds away, not minutes as you have to wind the tape to the right spot in a big tape jukebox... or hours or days if you have to ask someone to retrieve some tapes by hand. And can do hard drive storage for an order of magnitude lower cost than SSD.

Spinning disk is still really useful for video. Which is not shrinking any time soon.

notacoward · on March 19, 2021

That's a lot of time spent trashing tape, which I never suggested as a solution. The central question remains whether that single huge disk will provide the I/O rates that you need, and it won't. If video sizes increase and frames/second doesn't, then I/O need goes up so you'll need multiple drives even more. That means the sizes we already have, without fundamental shifts in technology, are even more likely to be sufficient. You'll just need more of them. What part of this math is escaping you?

Robotbeat · on March 19, 2021

No math is escaping me. Video is sequential I/O, which hard drives do fine at. Multiple drives is needed under many workloads, but not video.

If you want to do video stuff on a laptop you have room for maybe one extra hard drive. And if you are willing to have an external drive (which isn’t too bad), it’s not going to work to plug several hard drives in and expect to RAID them together.

Secondarily, there are only so many drives you can stuff in a workstation. A lot of compact ones only have a couple slots in them, and not everyone wants a RAID or JBOD controller with a bunch of ports on it just to do a video workflow. And if you have a video surveillance small server or appliance, you might only have a few slots in it (4 is fairly common).

So again, even one or two hard drives is fine for most uses. Not everyone is gonna put a 24 slot JBOD/RAID chassis in to just provide enough storage space for their video surveillance system or whatever.

notacoward · on March 19, 2021

> even one or two hard drives is fine for most uses.

The scenario you describe hardly sounds like "most uses" to me.

fomine3 · on March 19, 2021

"Video" is ambiguous use case. It would be fine for watching video on home server by single person or hoarding video files, but obviously not fine for video editing, video CDN edge, etc.

cbozeman · on March 19, 2021

You asked, "What market is there for big hard drives outside of FAANG [etc.]?"

I answered.

The home prosumer market. And if anyone ever bothered to develop a multi-terabyte solution for "normal people", whereby they could easily have their DVD / Blu-ray / Compact Disc collection converted from disc to a media server, probably a lot more people.

100 TB hard drives will end up in consumer PCs by 2035, I'm certain of it, even if we won't really need it. 8K is just plain stupid for home use, I honestly don't know why its being pushed, since you'd need a 120" screen or more, but I'm sure there'll be new media types beyond 4K... 4K60 FPS for instance, or high-resolution VR movies. I could easily see a world where 4K60 FPS movies take up 200 GB space each, and high-res VR movies are that large or even larger.

notacoward · on March 19, 2021

> You asked, "What market is there for big hard drives outside of FAANG [etc.]?"

No, I didn't. PaulHoule did.

> 100 TB hard drives will end up in consumer PCs by 2035, I'm certain of it, even if we won't really need it.

I think you're confusing the need for more capacity with the need for more capacity within a single drive. This is the same distinction you'll find in the original RAID papers. With a big single drive ("SLED" in the papers) your performance per megabyte stored and your reliability per megabyte stored keep going down. At some point this will make your system unusable. I contend that we're already on the edge and these newer technologies that only increase capacity will push a lot of people over. Increasing video resolutions only make the problem worse, not better. How many 4K60 streams are you going to get over a single interface?

Now consider an alternative: same amount of storage, across multiple drives. In the same physical space, because that higher areal density can be used to make drives physically smaller just as well as it helps make them logically bigger. In a similar power/heat envelope, because smaller also means lighter. But with better performance across more heads and more external interfaces. And with better reliability because with multiple drives you can add some redundancy (and even if you don't at least a single failure will only cost you some of your data).

Why would you want a single big disk instead? Complexity? It's not actually that big a deal. Non-specialists build such systems every day. The only thing the drive vendors need to do is use those improvements in areal density to make drives physically smaller instead of logically larger. Stop making the capacity/bandwidth gap bigger. Let people build balanced systems that actually work, instead of on systems with a striking resemblance to those used for "virtual tape" cold storage in the bad old days.

Dylan16807 · on March 19, 2021

> Increasing video resolutions only make the problem worse, not better. How many 4K60 streams are you going to get over a single interface?

How many does a prosumer need? One, maybe two. Maybe 5 in a NAS. A single drive today can already handle that in 8K.

When it comes to media files, we hit storage limits all the time but we're nowhere near the bandwidth limits of a hard drive. We're not on the edge when it comes to performance per megabyte. Program files and game files are miles away on one side, and videos and photos are miles away on the other side.

There's some point where increasing the density of hard drive platters is too slow for a prosumer media library, but I'm confident it's far out there, past a petabyte.

notacoward · on March 19, 2021

> we're nowhere near the bandwidth limits of a hard drive

According to my calculations, SATA-3 could support almost four 4K60 streams, but only if the data for those videos was very carefully interleaved (never happens). Also literally nothing else happening on the drive, no bad-block relocation, no bottlenecks elsewhere in the system, etc. Closer to reality, those files would be laid out on different parts of the disk and you'd lucky to get even two concurrent streams without seeks between them ruining your throughput. So yes, you are at the real-world bandwidth limits of a hard drive.

By contrast, two physically smaller drives adding up to the same capacity in the same space could reliably deliver one stream each, plus probably a third with data stored on both if you had decent buffering (because now you have enough MB/s headroom to buffer) to cover the seeks that remain. Just as when I was building storage systems for video professionals in 1994-95, if I had to deliver such a system and it had to work before I got paid I know which way I'd go.

Dylan16807 · on March 19, 2021

You're talking about raw very-lightly-compressed footage I think? But cbozeman and I are talking about final products. The high end is around 10 megabytes per second per stream.

If you have raw 4K footage, and you're editing with it, and you need multiple streams and the ability to scrub around, I would simply say not to use any hard drive.

fomine3 · on March 19, 2021

What is the purpose of "prosumer media library" for? I suspect it's not for editing because of HDD is too slow for such use case.

Dylan16807 · on March 19, 2021

The stuff you're actively editing goes on your SSD, everything else can live on a hard drive.

But the media library would be photos you take, videos you take, DVDs you rip, etc.

notacoward · on March 19, 2021

So what percentage of the market would you say that is? 1% would be generous. Should that drive the technology?

Dylan16807 · on March 19, 2021

I would say that the vast majority of the consumer hard drive market is in the bracket where size matters much more than performance.

I have no idea what fraction of the server market.

But there's also an important thing to note about product families. You talk about using density improvements to make drives smaller, and then install more of them to keep performance up. I think that's reasonable, but I also think that one of the best ways to do that is to reduce the platter count and drive height. In that world, where the main product is thin drives, it takes only a small amount of engineering effort to keep making an XL model that has lower performance but is significantly cheaper per TB.

notacoward · on March 19, 2021

I said I'm not an expert on the HDD market, but I have seen some stuff so here's a bit of perspective. The hyperscalers are more than 50% of the market. They're each big enough that price distortion from their own buys is a real concern that they plan for. (Note BTW that these are often left out of analysts' charts, because leaving them in makes it harder to see patterns among the rest.) More than half of what's left is sold to businesses, with most of that going to companies big enough to build and run their own data centers (including supercomputer facilities). And then, finally, all consumer drive sales account for something less than 20% of the total.

Yes, size does matter a lot for those markets. If you want performance use flash. However, again, size per drive is a red herring. What matters is the capacity you can fit into a system, whether it's a laptop or a server. Having that much capacity present as a single volume through a single interface is simply not ideal either for performance (which might not be the same goal but still has a lower limit) or for reliability. That's all I've been saying. You're better off combining multiple lower-capacity drives, even if you can get by (at least for a while) with a single larger drive. Serious video folks and even gamers have known the advantages of dual drive RAID-0 or RAID-1 for years.

> where the main product is thin drives, it takes only a small amount of engineering effort

What you're now suggesting is no more than what I suggested nearly a day and several posts ago (look for "drive manufacturers could help"), which you and "others" took issue with. Yes, drive manufacturers can and should make those thin drives, and then sell multiples packed into a single enclosure like we already have today. The fact that it's multiple physical drives could be more or less transparent. The transparent version would be cheaper and offer the system designer more flexibility. The non-transparent version, akin to existing HW RAID or even multiple platters today, would be a bit easier to conceptualize for people not used to thinking of enclosures and spindles and platters and heads as separate things, but it would be a bit more expensive (controller plus memory as part of the package) and not necessarily better.

In short, using "drive" to mean both the package with connectors on the side and the piece(s) of oxide-coated metal inside it is sloppy, and leads to wrong conclusions. Once you realize that higher density creates more options than "every limit the same except for higher capacity" then it quickly becomes clear that 120TB on a single spindle isn't the best use of that technology.

Dylan16807 · on March 19, 2021

> However, again, size per drive is a red herring. What matters is the capacity you can fit into a system, whether it's a laptop or a server.

I agree, but the argument I'm making is about cost per terabyte. I'm not inappropriately clinging to terabytes per drive.

> What you're now suggesting is no more than what I suggested nearly a day and several posts ago (look for "drive manufacturers could help"), which you and "others" took issue with.

I didn't take issue with smaller or multi-component drives existing, I just don't think they are necessary for all use cases. I was referring to what you said before on purpose, but disagreeing with the conclusion that "mostly we'd all better get used to higher drive counts". I got the impression you were treating it as a temporary transition measure.

effie · on March 18, 2021

Interesting. What do you think about Seagate's claim that dual actuator decreases their costs because it takes less time to test the drive? Is that the real reason for dual actuator?

altcognito · on March 18, 2021

Could easily reduce reliability.

Robotbeat · on March 18, 2021

It’s possible, just as increasing the density of NAND with 4 Levels per cell reduces write count.

But can be countered by more careful engineering of the device.

altcognito · on March 18, 2021

NAND failures tend to be more localized though.

Robotbeat · on March 18, 2021

This is probably not representative because it was using low grade USB flash devices from 10-15 years ago, but I think I’ve had more NAND flash failures than hard drive failures. I don’t think doubling the number of heads would make a huge difference as you wouldn’t be quite halving the reliability.

altcognito · on March 19, 2021

Oh, I totally agree in that regard. Hard drive technology has been mature for a good number of decades, so a lot of the kinks have been worked out.

jacquesm · on March 18, 2021

No, it will reduce reliability. More parts = worse reliability.

Robotbeat · on March 18, 2021

Maybe? The issue is mechanical devices can be engineered until the reliability level needed is achieved. Sure, take the same device and double the numbers, you’ll have half the reliability, but you’ll also have more parts being made and more parts to amortize the reliability improvement engineering over (and more parts to get reliability statistics from... which also means you can test the rest of the hard drive faster, which can provide other reliability enhancements). NAND devices have similar constraints, actually, but they’re reaching more fundamental limits due to quantum effects and they’re generally addressed with software/firmware tricks like wear leveling.

jacquesm · on March 19, 2021

> The issue is mechanical devices can be engineered until the reliability level needed is achieved.

And if wishes were horses...

No, you can not simply state some arbitrary level of reliability and design to that level. At any given level of reliability more parts = less reliability and more time or resources spent on design are not by going to remediate that unless you are willing to accept much higher costs.

Engineering is a trade-off and physics determine the sweet spot for the balance of that trade off. Once that sweet spot has been determined all other things being equal more parts will cause your reliability to go down unless you will also accept that your costs will go up and/or other parameters will be affected in a negative way.

Software people in general have a very hard time to understand this because to them 'parts' are free, but even in software, assuming zero costs for parts (adding a library, a function, a line of code) has an effect on reliability. That's why computers used to be much more reliable than they are today, and that's before we get into details such as cognitive load while trying to understand complex systems.

rzzzt · on March 18, 2021

Where do those fit, in the opposite corner?

hinkley · on March 18, 2021

De-clustered RAID also has disk replacement times that are proportional to usage instead of capacity, right? Whereas RAID5 and RAID6 require replicating empty blocks?

HDD are much faster when they're only half full.

rietta · on March 18, 2021

RAID 1 still has benefit when coupled with proper backup. Can still replace a drive and keep working faster than doing a restore from backup for large drives.

rzzzt · on March 18, 2021

With current failure rates, isn't one running the risk of hitting a URE while creating the first backup?

rietta · on March 18, 2021

A RAID does not increase the risk of a drive failure. It via redundancy reduces the risk, but not to zero. Probability of Single Drive Failure > Probability of Double Drive Failure. RAID is not a backup, so still need that backup.

1. Without RAID, drive fails and you're stuck waiting for restore from backup.

2. With RAID, risk double drive fails < risk of #1 and are stuck waiting for restore from backup.

3. With RAID, risk of single drive fails > 0 and < #2 and continue working while waiting on drive clone while still having the backup in #1 and #2 to fall back on.

effie · on March 18, 2021

URE is a read error, not a drive failure. So yes bigger disk increases chance of silent and loud UREs, but with URE rate 1e-15, for RAID1 disks up to 40TB are usable (you can redo the rebuild if it fails, or simply ignore the loss of data in many cases). If we ever get to 100TB drives, their URE rate will probably go down as well. This and low demand will probably make them quite expensive.

throwaway81523 · on March 18, 2021

Since the drives are full of ECC already, do UREs result from multiple simultaneous errors? Can that be fixed with even more ECC? I.e. if the errors are reasonably independent, you could change 1e-14 URE rate to 1e-18 or whatever you wanted, by just adding more ECC, in principle of size logarithmic in the size of the disk (though you might want to cache it in ram in such a case). That is how tape drives work, I think. The put ECC codes into blocks on the tape, then erasure codes over clusters of blocks, then more erasure codes over clusters of clusters, etc. RAID amounts to doing that over entire disks, after all.

effie · on March 19, 2021

I think UREs due to problems with medium (magnetic layer) can be eliminated to arbitrarily small rate with more and better ECC. However I think (probably small) part of the total URE rate are randomly occurring errors due to intermittent mechanical and electronic failures, mostly due to random environment influences. Eliminating those would require technology improvement beyond more ECC on the platter. Probably more ECC and higher quality of all components along the whole datapath.

Maybe the above is wrong because UREs are just the unrecoverable medium errors. I can't find the official definition of URE rate.

ghaff · on March 18, 2021

At the right cost points have a three-way mirror?

rietta · on March 18, 2021

Compared to what? The general argument for "no raid" is to have a single drive without redundancy other than backup. That's riskier than a mirror.

rzzzt · on March 18, 2021

120 TB drives compared to 4-8-12-ish TB drives. I'm wondering if there is a chance of ever completing operations targeting the entire disk (be it an initial RAID build, resilver, restore or full backup) without hitting a speed bump.

Dylan16807 · on March 18, 2021

Is that a statistic that matters?

If I have a bunch of 8TB drives, I don't care what my odds are of doing a single-drive operation without a speed bump. I'm going to back them all up at the same time, and I only care about speed bumps over the entire operation and how well they're handled.