The more interesting data point is that the "URE" spec. That is "unrecoverable read error". All of these drives do track at a time with full ECC to facilitate reading in the presence of noise, but a track write can have effects on adjacent tracks and temperature can affect head flying height, and vibration can affect head tracking, etc. All of which, statistically, can create a non-recoverable bit error. Typically this spec is like 1 in 10^14 bits.
What it interesting about this spec is that it is statistical, meaning that the more times you read the bits, the more likely you are to hit a URE. HDD manufacturers do all sorts of things to avoid them (re-trys, track mirroring, etc) but ultimately it is a mechanical system and there are a LOT of bits.
When I was at NetApp, the company would get reports of UREs that were fixed by RAID reconstruction (so you read a disk, you get an error and you fix it by the ECC in the other drives). And we had determined that by 4TB it would be "stupid" to using mirroring for protection since if a drive failed you couldn't count on being able to re-silver the mirror by getting a clean read of the other half. (something like 1 in 20 attempts would fail and you'd have data loss). And of course reconstructing a RAID-5 group for a failed drive you need to read the other drives, for larger groups of disks (which people did to maximize data storage) you started becoming at risk of being able to reconstruct all of the stripes. That was the genesis of doing dual parity (which Netapp called 'diagonal parity' because of the design).
The ZFS folks have also considered this and designed both dual and TRIPLE parity which makes sense for these larger drives!
It already takes a long time to reconstruct any "wide" vDev in ZFS that is zraid2. So one wonders if you could solve for device capacity where 'reconstruction takes 3 years' (the depreciated life span of a drive at Google).
Kirk McKusick took this kind of scale into consideration when designing softupdates for UFS. One of his criticisms of the competing log structured filesystems at the time was that the replay process would require RAM in excess of any server's capacity once disk drives were large enough. He had designed the softupdates recovery process to be incremental and use a fixed amount of RAM that didn't scale with volume size.
This kind of holistic systems' awareness is a sign of software that's designed well.
Agreed on 4TB to be a reasonable upper limit for RAID1 if URE rate is 1e-14. However, why would 1 in 20 rebuilds fail? And if it did, why would there be a data loss? The data is OK as long as the remaining disk is alive, you can retry the rebuild.
> I wouldn’t feel comfortable with RAID5/6 even on today’s ~10TB drives.
I fully agree with this part. De-clustered RAID really should have been mainstream a long time ago. Thankfully dRAID in OpenZFS will bring this to the "masses" in the sense of it being an open source implementation, whereas this has traditionally been a proprietary feature.
> The solution isn’t more parity disks, it’s to move away from that RAID set up entirely.
But I strongly disagree with RAID being the problem. Rather the increasing of capacity with bandwidth not really improving is the issue. Of course this is a inherent limitation of disk drives.
>But I strongly disagree with RAID being the problem. Rather the increasing of capacity with bandwidth not really improving is the issue. Of course this is a inherent limitation of disk drives.
I mean, yes, you could have ssds in raid and not have to worry about the drive failing before it's rebuilt, but we're talking about spinning disks here.
What's the current recommendations around raid capacity before you have to seriously start worrying about drive failure before it can be rebuilt? This is a genuine question I don't know.
>> What's the current recommendations around raid capacity before you have to seriously start worrying about drive failure before it can be rebuilt?
That depends on the size of your drives and the number in the array. If you are running a small NAS with only one parity drive, you don't want drives that take more than a day to onboard. That limits you to 4/6TB per drive. If you have two parity drives in the array you can probably risk the 8/10/12 TB size. If you are running commercial-scale arrays of dozen of drives, arrays that can handle multiple failures at once, then the sky is probably the limit.
But the story gets more complicated. Are all of your drives the same age? Are all the drives of the same model/manufacturer? Such things increase the risk ofone failure evolving into a multi-drive failure during rebuild. If/when I build a new array I want drives of different ages. So If I had 10 drives, I would start the array on five or six, holding some for later so that the entire array isn't the same age.
As long as you haven't hit the protocol bottleneck, aerial density almost always increases bandwidth. More bits per track equals more bits per revolution. If RPMs are constant then bits per second goes up as do bits per seek operation. It's the seeks that are murder.
Please ELI5, I know what each of these words mean in isolation but I know nowhere near enough about storage to understand what this means as a sentence.
Hard drive only have a handful of different rates of spinning (rpm). The diameter of the platters are also limited to a couple options. The number of platters varies a bit, but there are numbers that are too stupid-low and ones that are too high to fit into the case, so there are maybe 3-5 varieties?
If you have two drives with the same number of platters, the same diameter, and the same rotation speed, the one with the highest storage can only do that by increasing storage density - making the bits smaller. Smaller bits means more tracks, but also more bits per track. So in a single rotation (1/7200 of a second, for instance), the read head sweeps across more data. If it reads it as fast as it sees it (which they do), then that means a higher transfer rate.
Drive failure during a rebuild isn't a real concern unless
1) the rebuild takes time comparable to characteristic lifetime of the drive, which on average is >5years. Rebuild times are more in the realm of days, maybe coming to weeks with the proposed 100TB models. Still far from substantial probability of failure.
2) the drives are very old or have bad SMART data. Then the probability of drive failure and (array loss) shoots up.
What people often talk about regarding big drives and reliability concerns is probability of rebuild failure for RAID5, RAID6, i.e. some bit is read or written wrong and the array becomes inconsistent. This is much more probable but isn't really a big problem in practice because this can be detected and rebuild can be repeated.
OK the above isn't exactly correct in the last sentence. If there is URE due to medium error then we get a bad stripe and repeating rebuild will most often achieve nothing to remove it (although see SpinRite which can sometimes recover data from bad sectors). However, even if the bad sector can't be read, the "RAID failure" is still just loss of single or few sectors, which usually isn't a big deal. The array filesystem data is then best copied to a new array on new disks and then the old array discarded.
Not much, actually. Dual actuators only improve parallelism, not media transfer rate or latency. In practice, dual actuators don't even double external transfer rates due to occasional latency (which is unavoidable even for the best-case scenarios) and internal contention elsewhere in the drive. Even if you had four actuators working perfectly in concert, and weren't bottlenecked on the external interface speed (which you would be), a 4x improvement in transfer rate vs. a 8x increase in capacity would still mean a 2x increase in fill/empty time.
As with the shift from 5.25" to 3.5" to 2.5" and even 1.8" drives, the only way out of this bind is really to take advantage of the improved areal density to make drives that are the same capacity but smaller and pack more of those into the same volume or power/heat envelope. Drive manufacturers could help this along a little bit e.g. by sharing a motor and some environmental bits between what are otherwise completely separate drives (including separate external interfaces) within a single package, but mostly we'd all better get used to higher drive counts. Dual actuators - and this is far from the first time they've been tried BTW - are mostly a red herring.
Background: I worked on exactly these problems for the latter half of a thirty-year career, most relevantly at my last job working on an exabyte-scale storage system at a FAANG.
Also what market is there for big hard drives outside of FAANG and a few other places?
Retailers in my area (e.g. Best Buy) don't stock hard drives larger than 4 TB; I was going to tell somebody who lived in the Valley that he's lucky to be able to go to Fry's and then Fry's closed down.
Those same retailers stock both budget and quality SSD's up to 2TB in size. Most upgraders and the system builders are happy.
For the rest of us there is Amazon where the Seagate Exos "enterprise" drive costs half of what similar "consumer" drives cost, has a great reputation and does not seem hard to live with at home.
I would not take it for granted at all that a backup, RAID rebuild, restore, metadata scan or any full scan would work on such a disk if I hadn't tested it -- it is just that kind of technology. Synology is not crazy at all when they make you buy branded large drives to go in the enclosure.
It is common for retailers to stock low-spec parts, but that doesn't mean that there is no market for the higher-end stuff. Another example is routers: hypermarkets often only stock 10/100MB routers, and a gigabit router has to be ordered online from somewhere. But of course there are plenty of people with gigabit fiber connections who do a lot of torrenting or have a whole family using the connection who would benefit from ordering a gigabit router.
FWIW, I filled an 8T hard drive and then a 4T follow-up hard drive in just one summer of torrenting. Full-size Blu-ray rips are large, and the canon of great cinema is vast. Sure, it'll take me many months afterwards to watch everything that I have downloaded, but these large hard-drive sizes are well within the bounds of what a cinephile with a home theater building a collection would need, and they definitely just aren’t for large businesses. Of course, the hard drive discussed in the linked article is something else entirely.
> what market is there for big hard drives outside of FAANG and a few other places?
I wouldn't consider myself an expert on market numbers, but I'd say you generally shouldn't be using the larger drives. As others have said, disk is the new tape. Unless you have the kind of system where you might once have used tape - i.e. one with a substantial ice-cold-data component - larger disks are likely to be an ill fit. Even where they are a good choice, that's mostly going to be in a tiered architecture with flash etc. to suck all the "heat" out of the data going into them.
Hard drives are way cheaper if you need a lot of data. And not just cold backup.
It’s nice that 2.5” portable USB hard drives are cheap enough now (and USB ports providing enough power) that you don’t need the bulky power supply any longer. That means if you want to do any kind of bulk video storage, you can afford to do it. You can’t with SSD, which costs 10x as much per TB. A factor of 10 still matters to most people, ESPECIALLY if they’re not FAANG.
And I think relying on FAANG infrastructure is over-rated. Still a very good case for local storage. USB now provides enough throughput and power that it’s easier than ever to have a significant amount of local storage for cheap. And carry it with you if you travel.
I can get a used server on eBay with 96 TB of storage, usually in 12 x 8TB configs for around $2000 - $2500.
Flash drives? I'll spend 10 times that... hell, probably more. I love having my movies, music, television, etc. on my home server, but I don't love it at the level of 25 large. Now $2500? That's a lot more reasonable.
Sounds like you have a solution that works with today's medium-sized drives, so how is this even relevant to a discussion of 60-120TB drives? Do you think you'd be well served by putting that 96TB on one drive? I doubt it. If your active set (for any useful or relevant definition of "active") is X and the number of drives you need for IOPS or MB/s is Y, then the ceiling for how much you can effectively use per drive is X/Y. That's quite likely less than current drive capacities, and almost certainly within what can be achieved by current PMR technology. How would HAMR or BPM or any other more fundamentally new disk technology actually help you other than to store cold data on that "spare" capacity?
Yes? I mean, the sequential write speeds are fine for many video-heavy applications. You don’t want it in cold storage. Tape drives are not cheap. And neither is retrieving from tape terribly practical, either. There’s a large overhead for tape storage if you want it automatically accessible (ie a big tape jukebox). A big hard drive can store large volumes of data just a few milliseconds away, not minutes as you have to wind the tape to the right spot in a big tape jukebox... or hours or days if you have to ask someone to retrieve some tapes by hand. And can do hard drive storage for an order of magnitude lower cost than SSD.
Spinning disk is still really useful for video. Which is not shrinking any time soon.
That's a lot of time spent trashing tape, which I never suggested as a solution. The central question remains whether that single huge disk will provide the I/O rates that you need, and it won't. If video sizes increase and frames/second doesn't, then I/O need goes up so you'll need multiple drives even more. That means the sizes we already have, without fundamental shifts in technology, are even more likely to be sufficient. You'll just need more of them. What part of this math is escaping you?
No math is escaping me. Video is sequential I/O, which hard drives do fine at. Multiple drives is needed under many workloads, but not video.
If you want to do video stuff on a laptop you have room for maybe one extra hard drive. And if you are willing to have an external drive (which isn’t too bad), it’s not going to work to plug several hard drives in and expect to RAID them together.
Secondarily, there are only so many drives you can stuff in a workstation. A lot of compact ones only have a couple slots in them, and not everyone wants a RAID or JBOD controller with a bunch of ports on it just to do a video workflow. And if you have a video surveillance small server or appliance, you might only have a few slots in it (4 is fairly common).
So again, even one or two hard drives is fine for most uses. Not everyone is gonna put a 24 slot JBOD/RAID chassis in to just provide enough storage space for their video surveillance system or whatever.
"Video" is ambiguous use case. It would be fine for watching video on home server by single person or hoarding video files, but obviously not fine for video editing, video CDN edge, etc.
You asked, "What market is there for big hard drives outside of FAANG [etc.]?"
I answered.
The home prosumer market. And if anyone ever bothered to develop a multi-terabyte solution for "normal people", whereby they could easily have their DVD / Blu-ray / Compact Disc collection converted from disc to a media server, probably a lot more people.
100 TB hard drives will end up in consumer PCs by 2035, I'm certain of it, even if we won't really need it. 8K is just plain stupid for home use, I honestly don't know why its being pushed, since you'd need a 120" screen or more, but I'm sure there'll be new media types beyond 4K... 4K60 FPS for instance, or high-resolution VR movies. I could easily see a world where 4K60 FPS movies take up 200 GB space each, and high-res VR movies are that large or even larger.
> You asked, "What market is there for big hard drives outside of FAANG [etc.]?"
No, I didn't. PaulHoule did.
> 100 TB hard drives will end up in consumer PCs by 2035, I'm certain of it, even if we won't really need it.
I think you're confusing the need for more capacity with the need for more capacity within a single drive. This is the same distinction you'll find in the original RAID papers. With a big single drive ("SLED" in the papers) your performance per megabyte stored and your reliability per megabyte stored keep going down. At some point this will make your system unusable. I contend that we're already on the edge and these newer technologies that only increase capacity will push a lot of people over. Increasing video resolutions only make the problem worse, not better. How many 4K60 streams are you going to get over a single interface?
Now consider an alternative: same amount of storage, across multiple drives. In the same physical space, because that higher areal density can be used to make drives physically smaller just as well as it helps make them logically bigger. In a similar power/heat envelope, because smaller also means lighter. But with better performance across more heads and more external interfaces. And with better reliability because with multiple drives you can add some redundancy (and even if you don't at least a single failure will only cost you some of your data).
Why would you want a single big disk instead? Complexity? It's not actually that big a deal. Non-specialists build such systems every day. The only thing the drive vendors need to do is use those improvements in areal density to make drives physically smaller instead of logically larger. Stop making the capacity/bandwidth gap bigger. Let people build balanced systems that actually work, instead of on systems with a striking resemblance to those used for "virtual tape" cold storage in the bad old days.
> Increasing video resolutions only make the problem worse, not better. How many 4K60 streams are you going to get over a single interface?
How many does a prosumer need? One, maybe two. Maybe 5 in a NAS. A single drive today can already handle that in 8K.
When it comes to media files, we hit storage limits all the time but we're nowhere near the bandwidth limits of a hard drive. We're not on the edge when it comes to performance per megabyte. Program files and game files are miles away on one side, and videos and photos are miles away on the other side.
There's some point where increasing the density of hard drive platters is too slow for a prosumer media library, but I'm confident it's far out there, past a petabyte.
> we're nowhere near the bandwidth limits of a hard drive
According to my calculations, SATA-3 could support almost four 4K60 streams, but only if the data for those videos was very carefully interleaved (never happens). Also literally nothing else happening on the drive, no bad-block relocation, no bottlenecks elsewhere in the system, etc. Closer to reality, those files would be laid out on different parts of the disk and you'd lucky to get even two concurrent streams without seeks between them ruining your throughput. So yes, you are at the real-world bandwidth limits of a hard drive.
By contrast, two physically smaller drives adding up to the same capacity in the same space could reliably deliver one stream each, plus probably a third with data stored on both if you had decent buffering (because now you have enough MB/s headroom to buffer) to cover the seeks that remain. Just as when I was building storage systems for video professionals in 1994-95, if I had to deliver such a system and it had to work before I got paid I know which way I'd go.
You're talking about raw very-lightly-compressed footage I think? But cbozeman and I are talking about final products. The high end is around 10 megabytes per second per stream.
If you have raw 4K footage, and you're editing with it, and you need multiple streams and the ability to scrub around, I would simply say not to use any hard drive.
I would say that the vast majority of the consumer hard drive market is in the bracket where size matters much more than performance.
I have no idea what fraction of the server market.
But there's also an important thing to note about product families. You talk about using density improvements to make drives smaller, and then install more of them to keep performance up. I think that's reasonable, but I also think that one of the best ways to do that is to reduce the platter count and drive height. In that world, where the main product is thin drives, it takes only a small amount of engineering effort to keep making an XL model that has lower performance but is significantly cheaper per TB.
I said I'm not an expert on the HDD market, but I have seen some stuff so here's a bit of perspective. The hyperscalers are more than 50% of the market. They're each big enough that price distortion from their own buys is a real concern that they plan for. (Note BTW that these are often left out of analysts' charts, because leaving them in makes it harder to see patterns among the rest.) More than half of what's left is sold to businesses, with most of that going to companies big enough to build and run their own data centers (including supercomputer facilities). And then, finally, all consumer drive sales account for something less than 20% of the total.
Yes, size does matter a lot for those markets. If you want performance use flash. However, again, size per drive is a red herring. What matters is the capacity you can fit into a system, whether it's a laptop or a server. Having that much capacity present as a single volume through a single interface is simply not ideal either for performance (which might not be the same goal but still has a lower limit) or for reliability. That's all I've been saying. You're better off combining multiple lower-capacity drives, even if you can get by (at least for a while) with a single larger drive. Serious video folks and even gamers have known the advantages of dual drive RAID-0 or RAID-1 for years.
> where the main product is thin drives, it takes only a small amount of engineering effort
What you're now suggesting is no more than what I suggested nearly a day and several posts ago (look for "drive manufacturers could help"), which you and "others" took issue with. Yes, drive manufacturers can and should make those thin drives, and then sell multiples packed into a single enclosure like we already have today. The fact that it's multiple physical drives could be more or less transparent. The transparent version would be cheaper and offer the system designer more flexibility. The non-transparent version, akin to existing HW RAID or even multiple platters today, would be a bit easier to conceptualize for people not used to thinking of enclosures and spindles and platters and heads as separate things, but it would be a bit more expensive (controller plus memory as part of the package) and not necessarily better.
In short, using "drive" to mean both the package with connectors on the side and the piece(s) of oxide-coated metal inside it is sloppy, and leads to wrong conclusions. Once you realize that higher density creates more options than "every limit the same except for higher capacity" then it quickly becomes clear that 120TB on a single spindle isn't the best use of that technology.
> However, again, size per drive is a red herring. What matters is the capacity you can fit into a system, whether it's a laptop or a server.
I agree, but the argument I'm making is about cost per terabyte. I'm not inappropriately clinging to terabytes per drive.
> What you're now suggesting is no more than what I suggested nearly a day and several posts ago (look for "drive manufacturers could help"), which you and "others" took issue with.
I didn't take issue with smaller or multi-component drives existing, I just don't think they are necessary for all use cases. I was referring to what you said before on purpose, but disagreeing with the conclusion that "mostly we'd all better get used to higher drive counts". I got the impression you were treating it as a temporary transition measure.
Interesting. What do you think about Seagate's claim that dual actuator decreases their costs because it takes less time to test the drive? Is that the real reason for dual actuator?
This is probably not representative because it was using low grade USB flash devices from 10-15 years ago, but I think I’ve had more NAND flash failures than hard drive failures. I don’t think doubling the number of heads would make a huge difference as you wouldn’t be quite halving the reliability.
Maybe? The issue is mechanical devices can be engineered until the reliability level needed is achieved. Sure, take the same device and double the numbers, you’ll have half the reliability, but you’ll also have more parts being made and more parts to amortize the reliability improvement engineering over (and more parts to get reliability statistics from... which also means you can test the rest of the hard drive faster, which can provide other reliability enhancements). NAND devices have similar constraints, actually, but they’re reaching more fundamental limits due to quantum effects and they’re generally addressed with software/firmware tricks like wear leveling.
> The issue is mechanical devices can be engineered until the reliability level needed is achieved.
And if wishes were horses...
No, you can not simply state some arbitrary level of reliability and design to that level. At any given level of reliability more parts = less reliability and more time or resources spent on design are not by going to remediate that unless you are willing to accept much higher costs.
Engineering is a trade-off and physics determine the sweet spot for the balance of that trade off. Once that sweet spot has been determined all other things being equal more parts will cause your reliability to go down unless you will also accept that your costs will go up and/or other parameters will be affected in a negative way.
Software people in general have a very hard time to understand this because to them 'parts' are free, but even in software, assuming zero costs for parts (adding a library, a function, a line of code) has an effect on reliability. That's why computers used to be much more reliable than they are today, and that's before we get into details such as cognitive load while trying to understand complex systems.
De-clustered RAID also has disk replacement times that are proportional to usage instead of capacity, right? Whereas RAID5 and RAID6 require replicating empty blocks?
RAID 1 still has benefit when coupled with proper backup. Can still replace a drive and keep working faster than doing a restore from backup for large drives.
A RAID does not increase the risk of a drive failure. It via redundancy reduces the risk, but not to zero. Probability of Single Drive Failure > Probability of Double Drive Failure. RAID is not a backup, so still need that backup.
1. Without RAID, drive fails and you're stuck waiting for restore from backup.
2. With RAID, risk double drive fails < risk of #1 and are stuck waiting for restore from backup.
3. With RAID, risk of single drive fails > 0 and < #2 and continue working while waiting on drive clone while still having the backup in #1 and #2 to fall back on.
URE is a read error, not a drive failure. So yes bigger disk increases chance of silent and loud UREs, but with URE rate 1e-15, for RAID1 disks up to 40TB are usable (you can redo the rebuild if it fails, or simply ignore the loss of data in many cases). If we ever get to 100TB drives, their URE rate will probably go down as well. This and low demand will probably make them quite expensive.
Since the drives are full of ECC already, do UREs result from multiple simultaneous errors? Can that be fixed with even more ECC? I.e. if the errors are reasonably independent, you could change 1e-14 URE rate to 1e-18 or whatever you wanted, by just adding more ECC, in principle of size logarithmic in the size of the disk (though you might want to cache it in ram in such a case). That is how tape drives work, I think. The put ECC codes into blocks on the tape, then erasure codes over clusters of blocks, then more erasure codes over clusters of clusters, etc. RAID amounts to doing that over entire disks, after all.
I think UREs due to problems with medium (magnetic layer) can be eliminated to arbitrarily small rate with more and better ECC. However I think (probably small) part of the total URE rate are randomly occurring errors due to intermittent mechanical and electronic failures, mostly due to random environment influences. Eliminating those would require technology improvement beyond more ECC on the platter. Probably more ECC and higher quality of all components along the whole datapath.
Maybe the above is wrong because UREs are just the unrecoverable medium errors. I can't find the official definition of URE rate.
120 TB drives compared to 4-8-12-ish TB drives. I'm wondering if there is a chance of ever completing operations targeting the entire disk (be it an initial RAID build, resilver, restore or full backup) without hitting a speed bump.
If I have a bunch of 8TB drives, I don't care what my odds are of doing a single-drive operation without a speed bump. I'm going to back them all up at the same time, and I only care about speed bumps over the entire operation and how well they're handled.
Depends what you use them for. With a high-capacity hard drive you could store a lot of downloaded movies and you can always download them again if it fails
Yeah, these would be more for data warehousing; keep in mind that at the same time, SSD's are getting much bigger (while being more compact), so SSD's are taking over from hard drives on the 'front lines', while these new big disks can be used for big / slow storage.
It took 24 hours for me to validate/check my 16TB disk (I don't know the linux commands, it was a tool /r/DataHoarders suggested). I've upgraded the quality of all my video files and I've still got TB's of space to go..
120TB.. Man, to those in the future laughing at this comment with your 120TB games and your 60TB super-high-definition VR feature length movies.. I'm jealous.
Either you used a very pessimistic estimate for drive speed and forgot to factor in the increase in linear density, or you used a relatively accurate calculation and mixed up megabytes per second with megabits per second.
The mainboard could be a Raspberry Pi class system but with an on-board SATA. There are a number of those on the market. The board would be smaller than the drive.
Seagate should sell such a machine capable of running arbitrary Linux distributions as a kind of mini-NAS.
Oh you mean a home solution for few people who do not require 99.99% uptime, then yeah one could try to go that way although all such attempts I've seen (such as [1]) seem way too cumbersome, I would just build a silent PC, way simpler, more extensible and easily replaceable components. I thought you meant to use RAIN for business servers.
I'm excited to finally see multiple actuators on the road map. I've always thought a single actuator was a waste considering you can practically fit two into a standard 3.5" drive without changing anything.
The dual-actuator drives that Seagate is currently working on still have only one head per platter surface. They're not putting two sets of arms into the drive like the old Conner Chinook drives [1]. Instead, Seagate's dual-actuator drives mean that the arms for the top 3-4 platters move as one group, and the arms for the bottom 3-4 platters move as a second group. They're basically giving you two separate hard drives that share a spindle motor, and are in a transparent RAID-0.
IOPS/TB is an important metric for hard drives these days. Capacity increases have been pushing that number down, making the performance characteristics of larger hard drives incrementally more tape-like and less suitable for live data. Splitting the actuators gets hard drives back to the IOPS/TB they had a few generations ago, which makes them suitable for a slightly broader range of applications than they otherwise would be. It also means that an older system using an array of 8TB drives can be consolidated onto half as many 16TB drives with minimal other architecture changes.
(Decreasing IOPS/TB is also why SATA SSDs haven't gone beyond 8TB while NVMe and SAS SSDs are pushing past 30TB.)
I'd want to see an overall calculation. Twice as many actuators means twice as many things can go wrong. I think I prefer KISS. On the flip side, I guess each actuator gets used half as often.
I use HDD for bulk storage and SSD for high-speed storage. I don't mind less speed, except insofar as it impacts reliability (e.g. a drive in a RAID fails, and the redundant drive fails while rebuilding the array).
It seems like competitors to HDD have become increasingly noncompetitive over time. Tape drives are insanely expensive at reasonable capacity. DVDs hold 0.1% of an HDD and are more expensive. Various newfangled optical drives (e.g. M DISC) appear more reliable than HDDs, but also cost much more per TB. Plus, you need to keep a big pile of coasters.
I don't think that will be true. By having the heads only read half the platters, the bytes per track are <areal density increase>/2. That's more seek operations for linear reads and small offsets. For completely random access that's probably the same number of seek operations as before, but queuing latency is lower.
Where it might reduce seeks is where multiple processes are trying to stream data at the same time, which sounds like is happening more and more.
Dual actuators don't actually double IOPS/TB for all sorts of reasons - queuing effects, contention for resources elsewhere in the drive, or limitations of the external interface. In practice it tends to be far short of double, so you're better off planning for larger numbers of (possibly smaller) drives with the same capacity and IOPS than for single drives getting faster by any measure.
Raw performance isn't everything to everyone. Magnetic disks are still much cheaper than SSDs on a cost/GB basis and will continue to be for a good while yet. They are also more durable for many common workloads. As long as performance can be increased without increasing the cost too much, there will be a market for them.
Seagate's presentation made a big deal out of Total Cost of Ownership (TCO) and said most data centers still store 90% of data on hard drives and not SSD.
Their goal is to retain that TCO edge by ensuring that HD performance doesn't get too much worse as HD capacity increases.
I suspect multiple heads on the same platter might lead to all kinds of calibration issues between the heads. You'll get weird issues like having to remember which head wrote each block of data so you know which set of calibration parameters to use to read back that data.
I could imagine certain resonances (eg. the height of the head oscillating) might differ between the two heads, effectively meaning data written with one set of resonances cannot be read back with another.
Excellent points. Also, the vibration issues with multiple heads moving independently are a bit of a nightmare, and ameliorating those nullifies some of the merely-theoretical performance gains. Multi-actuator drives have been tried before not once but multiple times, and they've never lived up to their makers' promises.
Modern drive heads are constantly reading the calibration data next to the user data and adjusting on the fly - an absolute necessity at current densities since minor temperature fluctuations would effectively destroy the ability to read or write data due to expansion/contraction pushing the head out of alignment.
IIRC the next step is to put the logic for that in the head itself with a tiny processor so the feedback cycle can be nearly instant. I suspect that would eliminate issues with one head reading something written by another.
Where is there a performance benefit to increasing storage density rather than using multiple drives to increase storage? A few comments here mention the time it'll take to rebuild an array and that feels like a blocking issue, at least for me.
I suppose as things like 8k video editing come up and file sizes explode there will be use cases for this kind of density, but without read/write and throughput increasing it seems like it won't be super useful for a little bit.
Data Density is the biggest single driving need for storage when you get towards datacentre / cloud environments. You want as many TB per rack as you can possibly get, because your dominant cost over time is not the initial upfront capital + depreciation, it's the per-rack running costs.
S3, BackBlaze etc. all focus on cramming as many hard disks in to a single machine as they can do, without running in to other bottlenecks on the machine level (CPU, memory, NIC bandwidth, controller etc).
You very much want to get out of the RAID business in those environments too. Backblaze mention their use of Reed-Solomon which is fairly common on large scale storage, and moves you much closer to resiliency on an individual object basis, rather than thinking in terms of the entire drive.
Throughput tends to increase with storage density, because more data is stored in the same length of physical track.
Consumer-grade HDDs barely managed 100MB/s 10 years ago. Now they can often do 200MB/s, and enterprise disks are even faster. With these much larger Seagate drives I guess SATA will be the bottleneck, not the drive's sequential read/write speed.
You're probably still correct, but what I saw from the immediate next generation was something like 9 platters and two or more independent read/write heads.
So we might be headed for what used to be a single write head (and its single throughput stream) to double, triple, or more.
Especially since the additional read heads enable the datacenters to scale shared object storage more effectively with more dense drives, which seems to be the main customer/application for HDDs at this point.
I remember a long time ago when I upgraded my motherboard and was pleasantly surprised that my SSD suddenly became twice as fast. Didn't even consider that I was switching from SATA2 to SATA3, was more interested in my CPU upgrade.
On an unrelated note, are you the same Dragontamer that I've met and played with at PDXLAN? Or do you just happen to use the same alias?
I'd like to see a graph of density versus iops over time. It definitely feels like the gap has been widening for quite some time just based on how long my ZFS arrays take to do a scrub.
To answer the OP's question, it seems to me that after around 12TB or so, it makes more sense to move away from implementations that require rebuilds such as raid 1, no raid, or jbod solutions.
Random IOPS is and always will be stuck at 240 IOPS for a 7200 RPM drive.
7200 RPM / 60 == 120 rotations per second. A "half-rotation" to move the typical data on the disk to the head (half the data is within the first half-rotation, the other half of the data is within the 2nd half rotation).
If you want to reach the data faster, you need to physically rotate the disk faster: such as a 10,000 RPM drive, 15k, or 20k drive. To allow for faster rotations, you shrink the drive to 2.5" or even 1.8". Alas, SSDs have taken over this niche entirely, so we only really have 3.5" and 7200 RPM drives anymore.
Having dual actuators (Seagate's Mach.2 branding) can increase IOPS by having 2 heads process the queue in parallel. That should bring a noticeable improvement, but it's true that it doesn't apply to sequential random (just like NCQ didn't by reordering the queue -- you need a queue).
Not sure if there will be consumer drives with this eventually or if the cost is too prohibitive.
Except we can already achieve that kind of IOPS increase: by simply using two hard drives in parallel (be it RAID0, or even RAID1 if your driver is willing to split the reads between hard drives).
A multi-actuator drive isn't really "one hard drive" anymore, its really just two hard drives ganged together. While more physically convenient, it doesn't seem to really offer the true 2x increase we're looking for.
Actuator#1 cannot give more IOPS over the data that Actuator#1 is assigned over. You only get more IOPS if you can split the work between the two actuators. Same problem as RAID0 or RAID1 multi-read hard drives (you gotta figure out a way to "split the work" to get RAID0 truly 2x the IOPS).
RAID0 can't give you a true 2x increase, because reads and writes are constrained to a particular device, and big reads tend to require both drives working together.
RAID1 can give you a 2x increase in reads, but suffers even more than RAID0 when it comes to writes.
Dual actuators, implemented in a straightforward way, can both access the entire drive surface which means they can give you a true 2x increase. Sometimes even better than 2x, because each arm can focus on one side of the disk. For read/write workloads it completely outclasses RAID.
> because reads and writes are constrained to a particular device
That constraint means nothing here. You can issue two parallel reads to two drives in RAID-0 just as easily in RAID-1. The only case where this doesn't work is where you're reading more than 2x the interleave size and you're issuing separate requests for each interleaved chunk. With command queuing, a smart storage system should even recognize the pattern and buffer to reduce the damage, but you'll still pay a cost in extra interrupts and request handling though so it's better to learn about scatter/gather lists.
> they can give you a true 2x increase
I already explained why this isn't actually the case, and have observed it not to be the case with multiple generations of dual-actuator drives. Stop presenting theories based on misconceptions of how disks and storage stacks work as though they were fact.
> You can issue two parallel reads to two drives in RAID-0 just as easily in RAID-1.
Under RAID 0, the odds are 50% that two independent reads are on the same drive. It's impossible to get a speed advantage in that case.
> I already explained why this isn't actually the case
You said they "improve parallelism, not media transfer rate or latency", and I'm arguing about parallelism. Plus large transfers can be rearranged into parallelism (fact, not theory).
And you said that they can face internal contention "elsewhere" but implied that could be fixed.
So that doesn't sound like what you said disagrees with what I said.
> Under RAID 0, the odds are 50% that two independent reads are on the same drive.
If you have a single sequential stream, then no. You'll either have parallel reads across the two drives, or you'll have alternating reads that the aforementioned semi-smart storage system can turn into parallel reads with buffering. If you have multiple sequential streams, then it's practically going to be like random access, which you already put out of scope. So there's no relevant case where RAID-0 is worse than RAID-1 for reads.
But you know what will be worse? Dual actuator drives. Why? Because of what dragontamer (who was right) mentioned, which you overlooked: the two actuators serve disjoint sets of blocks. They even present as separate SAS LUNs[1] just like separate disks would, so you would literally still need RAID on top to make them look like one device to most of the OS and above. But here's the kicker: they still share some resources that are subject to contention - most notably the external interface. Truly separate drives duplicate those resources, enabling both better performance and better fault isolation. Doubled performance is an absolute best case which is never achieved in practice, and I say that because I've seen it. If Seagate could cite something more realistic than IOMeter they would have, but they can't because the results weren't that good.
The only way dual actuators can really compete with separate drives is to duplicate all of the resources that change behavior based on the request stream - interfaces, controllers, etc. Basically everything but the spindle motor and some environmentals, as I already suggested now two days ago. You'd give up fault isolation, but at least you'd get the same performance. That's not what Seagate is offering, though.
Since you added a huge amount since I replied, I'll make a separate reply.
> But you know what will be worse? Dual actuator drives. Why? Because of what dragontamer (who was right) mentioned, which you overlooked: the two actuators serve disjoint sets of blocks.
They don't have to do that.
I was talking about what you can do with dual actuators, not product lines that already exist.
I didn't realize how mach.2 was designed, though. That's a shame.
> But here's the kicker: they still share some resources that are subject to contention - most notably the external interface.
Each head, even at peak transfer rate, uses less than half the bandwidth of the external interface.
So even if both of them are hitting peak rates at the same time, and the drive alternates transfers between them, things are fine. For example, let's say 128KB chunks, alternating back and forth. Those take .2 milliseconds to transfer. That makes basically no difference on a hard drive.
> Doubled performance is an absolute best case which is never achieved in practice, and I say that because I've seen it.
I completely believe you, about drives where each arm can only access half the data.
> The only way dual actuators can really compete with separate drives is to duplicate all of the resources that change behavior based on the request stream - interfaces, controllers, etc.
Or upgrade them to 1200Mbps, which isn't a very hard thing to do.
> I was talking about what you can do with dual actuators, not product lines that already exist.
Since you didn't know they're different until a moment ago, you were talking about both. Don't gaslight.
> Each head, even at peak transfer rate, uses less than half the bandwidth of the external interface.
So two will come damn close ... today. With an expectation that internal transfer rates will increase faster than standards-bound external rates. And the fact that no interface ever meets its nominal bps for a million reasons. Requests have overhead, interface chips have their own limits, signal-quality issues cause losses and retries (or step down down lower rates), etc. Lastly, request streams are never perfectly balanced except for trivial (mostly synthetic-benchmark) cases, and the drive can't do better than the request stream allows. There are so many potential bottlenecks here that any given use case is sure to hit one ... as actually seems to be the case empirically. Your theory remains theory, but facts remain facts.
SSDs achieve their speed in part by combining multiple independent NAND channels under a single controller - each channel is more or less equivalent to an actuator. Their speed vary greatly based on workload parallelism, yet it's still very much one drive.
Using multiple drives is costly, it is much cheaper to consolidate if possible.
Rebuild time per TB will actually slightly improve because of better throughput due to higher density and higher number of disks inside the drive, so the recovery time for small arrays will actually get better.
True, rebuild time for a whole drive will get very long which is not great, but if the array is designed with good enough redundancy, this won't be a problem, less alone a blocking issue. The very point of RAID is that the system is functional even in the state of rebuilding. If enough drives are used, it does not matter that the rebuild takes 1 month.
In a large enough system, over a long enough time, even rare failure modes become inevitable. I was hearing about RAID-6 insufficiency at national labs ten years ago. Rebuild times were already long enough that, sooner or later, a second and then third failure would hit the same RAID group during the first rebuild. Data go poof. Since then, I've worked on even larger storage systems and seen overlapping failures cause data loss with even higher levels of redundancy. Throughout, I've seen the performance degradation from overlapping long rebuilds cause system-wide performance to drop below acceptable levels.
Higher areal density won't improve rebuild times unless internal transfer time is the bottleneck (it's not), and it very much does matter if rebuilds take a month. If that additional capacity isn't accompanied by proportional amounts of external-interface bandwidth and CPU/memory somewhere, then bigger disks will mean more risk of data loss. The math is unforgiving.
> In a large enough system, over a long enough time, even rare failure modes become inevitable.
Of course rare failures and loss of data do happen. There is no storage strategy that prevents these with certainty.
Data loss and performance degradation should be expected and designed for. Maybe RAID6 isn't cutting it for petabyte projects, but it is fine for vast majority of RAID users (small businesses, <12TB arrays).
I've noticed that special hardware and design requirements of the few largest operators are somehow proselytized as a standard that everybody should adopt. People just like to talk about how they understand the biggest deployments in the worlds and how that is the best practice for everybody. But for most users of RAID, these bigboy strategies are irrelevant. Arrays below 12TB are very common and work acceptably well with RAID5 / RAID6, and occasional stripe failure very often isn't a big deal for home users or small businesses.
> Higher areal density won't improve rebuild times unless internal transfer time is the bottleneck (it's not), and it very much does matter if rebuilds take a month.
Why? It matters only if running in degraded state poses performance/reliability problems to users. Which means the array wasn't designed with proper redundancy and performance in the first place. That is the problem, whether rebuild takes a day or a month. Large drives 100TB will be fine if enough of them is used in the array so it works well in degraded state. Also, most probably URE rate will go down due to better ECC measures with 100TB drives.
> Large drives 100TB will be fine if enough of them is used in the array
So one one hand you say that "big boy stuff" doesn't matter to anyone else, but on the other you say that "proper redundancy" requires higher scale. Seems a bit Goldilocks-ish to me, or perhaps even a bit slippery. There's a pretty well established trend, especially in storage, of things that happen in large systems becoming very relevant to smaller ones over time. RAID itself was considered a super-high-end niche once. And don't assume that my knowing about the high end means I don't know the low end as well, or make appeals to authority on that basis. Rebuild times have always been an issue worth addressing, from 1994-95 when I was working on the then-highest-density disk array (IBM 7135/110) to now, from high-end HPC to SOHO. Don't act like you occupy some magical space where what's true everywhere else is not true as well.
Regarding "bigboy stuff", it is really a simple argument, let me repeat in simpler words. Extreme data reliability beyond RAID6 is important for some specific deployments where loss of data is unacceptable, say for a unique experiment at CERN or a long supercomputer job that can't be repeated. But such strategy is also needlessly costly for other, less critical RAID users. The latter group of operators is many times bigger and this is often not reflected in these "RAID5/RAID6 is obsolete" discussions.
I agree with you that in time, the high-end tech becomes the standard tech. But that takes some time. There is quite a non-magical space of small providers who do not care for super reliable storage or super fast rebuilds and this will be the case for a long time. Yes the faster the rebuild the better, and "it is a concern" is fine. One week or month rebuild can be lived with. There is nothing magical about one day, one week or one month. They are all very short compared to typical drive lifespan.
At the same time, yes I believe 100TB drives, if they come, will be used in those extremely reliable big deployments, simply because of better TCO and expansion of data. Even if rebuild times will be longer than today, I believe it can be made to work reliably.
For long term storage on SSDs, how does one insure long term viability of the storage? Are manufacturers providing some sort of internal refresh as long as the device is powered?
My wild guess would be that they hold all data erasure encoded and run regular scrubs. So that any blocks with bit flips would be caught and corrected.
I know some drives do regular scrubs, but I don't think it's all that common. Mostly they monitor error rates when fulfilling read requests from the host, and use that to decide when data needs to be refreshed. If you write enough new data to the drive, eventually wear leveling will mean all of the old data you haven't modified has been moved and thus refreshed as a side effect. So the drive only needs to do background scrubs if it gets a very WORM-like workload but also leaves large portions of the data entirely untouched.
NAND flash data retention is related to how worn-out the flash is, in terms of program/erase cycles. A drive that's at the end of its rated write endurance is still expected to be able to retain data for one year (consumer) or three months (enterprise). Flash that isn't significantly worn out has much longer data retention.
Yes. The combination of erasure coding plus regular scrubbing is already the standard in the largest proprietary storage systems. As it happens, I worked on exactly the scrubbing piece ("anti entropy") for such a system at Facebook. There was a lot of analysis of data-loss probabilities based on encodings, placement across power/network domains, scrub rates, repair rates, etc. Since there are also performance and resource-use implications behind many of these, it's actually a very complex balancing act. That's why the knowledge of these second- or third-order problems and their solutions is slowly filtering down to storage systems you can deploy yourself.
I'm not quite sure what you mean there. Your software certainly should do regular scrubs, but there are a lot of storage systems out there that don't do this. And there most definitely are SSDs that do their own scrub akin to what host software can do.
There's no one cutoff or threshold at which hard drives die. SSDs have already made hard drives smaller than about 320GB completely uneconomical on a pure $/GB basis, and have killed off high-RPM hard drives. SSDs will continue to displace hard drives anywhere that performance or battery life matters.
You can break down drive costs somewhat into the fixed costs (SSD controller, or hard drive spindle motor and actuators) and the costs that vary with capacity (NAND or platters+heads). The fixed costs tend to be lower for SSDs (or at least SATA SSDs), and the variable costs are higher for SSDs because adding NAND is more expensive than another platter.
Hard drives smaller than several TB are no longer getting new technology (eg. you won't find a 2-platter helium drive), so whenever NAND gets cheaper the threshold capacity below which hard drives don't make sense moves upward.
For the near future, hard drive manufacturers have a clear path to outrun the capacities available from cheap SSDs. NAND flash gets you more bits per mm^2 than a hard drive platter, but platters are far cheaper per mm^2—enough to also be cheaper per bit. I think that relationship will still be true by the time hard drives are using bit-patterned media and 3D NAND is at several hundred layers.
Ultimately, it might make more sense to ask when we will see SSDs having taken over the former hard drive market with hard drives having moved entirely into the market traditionally occupied by tape.
What’s interesting is we’re seeing the ending of the 2 year Moore’s Law doubling rate at the same time (as we reach near the limits of the physics of photolithography as we reach toward EUV light sources... if we go much shorter wavelength, we get inherent shot noise from the high energy of the photons... and this in addition to device-level quantum effects), so we might never quite reach the point where SSDs totally take over. Also, even tape seems to be defending its niche from hard drives.
Also, higher RPM hard drives are still a thing and are holding their own to some degree. The higher RPM can help with rebuild times and to reduce the impact of the lower random IOPS. Also, conventional hard drives do not have the write limitations that (especially cheaper) SSDs have, although that has improved over time.
So I think we’ll just see a continuation of the three-tiered system of storage for many years to come, but with hard drives increasingly disappearing into the cloud and away from consumer devices. SSDs for most things, hard drives for bulk server/cloud storage, and tape still for cold, long-term-stable storage.
I think we’ve already seen a plateau in storage cost reduction as SSDs are not cheaper per TB than HDs. I think we’ll put more effort into being efficient with storage management in the future as we can no longer simply rely on doubling storage capacity every couple years.
Sold and used, sure. Lots of dead-end enterprise hardware stays in service and officially still available long after it stops making sense. Long validation cycles, etc.
As far as I can tell, WD's 10k RPM drives are discontinued and no longer listed on their site. Seagate lists 10k RPM drives up to 2.4TB and 266MB/s, with a 16GB flash cache. Looking on CDW, it's more expensive than a 3.84TB QLC drive. It uses more power at idle than a QLC SATA SSD under load. I can only imagine a few workloads where the 10k RPM drive would be preferable to the QLC SSD, and I'm not sure the 10k RPM drive would have better TCO than 7200 RPM drives for such uses.
Are there any situations that you think still call for 10k RPM drives to be selected, rather than merely kept around due to inertia?
> hard drives having moved entirely into the market traditionally occupied by tape
It'll be curious if the hyperscale clouds decide to self-manage more HDD functions, and dumb down the devices ($), or leave that to HDD manufacturers ($$).
I imagine there's some savings to be had stripping memory & controllers out of drives, when you're deploying in large groups anyway. Similar to what was done with networking kit.
Or maybe this already happens? Moreso than "RAID-edition" drives.
Hard drives already present a fairly minimal abstraction over the underlying media. For most drives, it can't get any simpler unless the host system software wants to get bogged down with media and vendor-specific details about things like error correction.
For drives using Shingled Magnetic Recording (SMR), the storage protocols have already been extended to present a zoned storage model, so that drives don't have to be responsible for the huge read-modify-write operations necessary to make SMR behave like a traditional block storage device. I suspect these Host-Managed SMR drives are not equipped with the larger caches found on consumer drive-managed SMR hard drives.
Depending on what you mean by "dumb down" Seagate have their Kinetic stuff presenting an object API which is could be viewed as simpler but arguably the drive is doing more.
I've read that unpowered SSDs do not retain data for long periods of time.
I've also been told that we have the technology to get data OFF of hard drives in cases of catastrophic failure. we don't have that capability with SSDs.
So for archiving, I think hard drives should stay a long time.
I think hard drives will be the "tape drives" of the future, relying on capacity more than random i/o speeds.
As someone who services client SANs, I could not disagree harder. I cannot count how many times I have seen some site suffer a poweroff and then half the spinners don't power back up even though the health checks were previously green checkmarks across the board. Most folks plan their RAID for 1 or 2 failures, not 6!
If you want to archive at rest, bluray/dvd or tape in good climate controlled storage is the only way to do it. Your HDD will spontaneously die simply because it is a moving part well before a comparative SSD reaches the write-limit.
I agree on the idea of BD/DVD/tape in a well-kept environment for good long term storage. When storing on optical media you'll want to be sure to get an archive-grade storage medium like M-DISC.
I've had DVD archives become unreadable a few years later and stopped using them. Is your argument that drives that can read them will get better faster than the media degrades?
I'm not sure it would be economical for bulk/cloud storage. It might be cheaper to just have geo-diverse redundant storage such that a failed drive can be rebuilt anew from redundant copies of the same data.
I'd be interested to see if optical technologies find a niche, they seem most stable for long-term storage, e.g. M-DISK
There are physical limitations. An electron trap will have a minimum size. New ways for leaps and bounds need to be discovered. I think it's not reasonable to assume that both magnetic and electric storage have the same maximum density of infinity.
Unlike disks flash has the nice advantage that it's not really constrained by form factor though. With a server full of E1.L drives you get strong performance and large storage in a dense space. Of course that's eye-wateringly expensive today but I think the combination of these factors will be what displaces disks rather than purely $$$/TB.
Is that twice the power at full bandwidth? Because if so, then the SSD is still more efficient, because those reads/writes will be over a lot more than twice as fast, and the SSD can go back to idling.
I guess that is active power usage, but considering how much faster SSD's are compared to HDD's I doubt that scales with amount transferred (esp for random read/writes). So what would the power usage be when the total over active work time is counted?
Samsung quotes 148 MB/s per watt for the PM1643, for sequential transfers. Hard drive performance tops out around 270 MB/s and draws something less than ~9.5W, so the SSD is 3-5x more efficient for sequential transfers. For random IO, it's several orders of magnitude difference in performance and efficiency.
The most power-efficient consumer SSDs are an order of magnitude more efficient for sequential transfers than the Samsung PM1643. Eg: https://www.anandtech.com/bench/SSD18/2460
Does anyone have a use-case for a 120TB drive for casual personal computing? I know we are in the age where 4K video is the new normal now, so I can see the use-case for massive .MP4 binary blobs sitting in these drives. But what other use-cases besides hi-def video are there? Also: if it is backups I would love to know what you're backing up! I try to keep my critical files down to 10GB so I can mirror it multiple times in many locations, but 120TB is whole new levels of strange.
I hear you, but my experience is that increases in capacity (be it storage, processing, network, etc.) bring new possibility, new technology, and new use cases.
Here are some off-the-cuff possibilities:
"Photos" that are interactive "spaces" composed of combinations of many high-resolution images and (LI/RAY)DAR point clouds.
"Videos" that are the same or far beyond 8k
Individuals using constant-capture systems to archive their day-to-day, providing the ability to "never forget." This could be supplemented with local capture and storage of extensive biometric data for personal-health analysis.
Amazing graphics / texture sets to locally provide standard libraries for very high resolution AR / VR experiences.
Local archives of reference and web content so that it can be browsed privately to ensure privacy.
A general push-back against cloud storage in favor of local privacy. This might be powered by some federated system that allows individuals to give up some local storage in exchange for storage on other's systems.
Games have started exploding in size now that basically everything is downloaded (meaning there's no hard upper limit). In a particularly comical example, the last two Call of Duty games won't both fit on the PS4's 500GB hard drive at the same time, by themselves. [1]
120TB may still be more than you need for now, but 1-2TB in a gaming PC is now "cozy".
The size of data which "must" be stored always increases to fill the storage space available.
At the moment, I'm content with 14T drives for personal use, which is about 4x bigger than what I was using just a few years ago.
But if I had 120T, I would just start hoarding every bit of data I could get my hands on. I'm not quite https://www.reddit.com/r/DataHoarder/ level yet, but I could totally see myself heading in that direction.
And at $DAYJOB we need to store data in the order of magnitude of a petabyte. This also seemed huge just a few years ago, but now feels restrictive. 30x 120T drives to shove into a rack in the data centre would totally be considered if it were possible.
I remember in the 90s I heard someone ask how you could ever fill a 1GB hard drive ...
Photos keep getting larger (raw, higher resolution, capturing additional information such as depth etc), videos keep getting higher resolution at 4k+ with higher bitrates, you can currently easily fill up a 1TB+ drive with video game assets and so on.
If you do any 3D work (engineering or artistic), the assets for these projects can be huge. If you're doing ML work training sets are massive. If you do more conventional development/content creation, software and assets needed keep getting larger with a never ending list of things to pull down from github etc. If you're just working with data/databases, the amount your working with never seems to get smaller and if it's time-based there's always more history to deal with as the years tick by.
Even if one does everything in 'the cloud', all you're doing is shifting the location of the storage needed rather than the quantity. So it's pretty much the same old story of 'more' when it comes to how much storage is needed.
I have saved on hard drive, pretty much all linear media I have watched the last 15 years (recorded TV). It takes up ~25Terra. Most of it is in SD, some of it is in HD, pretty much nothing is in 4K. With a 1Gbit connection I'm not bandwidth bound, I'm storage bound. 120TB for a single drive is a lot, but I could see myself being in striking range of using that.
I don't see much need for the personal computing. I know with my work, we regularly produce >30TB of high def video that we then need to sneakernet to our on shore storage and give out to clients. Not having to split the data between drives would really help.
While it's not really "casual personal" computing, if you're dealing with video streams for archive, you can run into that much data pretty easily. A dozen HD security camera streams will eat up like 200 GB a day at high quality.
Data density is great and all, but this line got me wondering if we wouldn't be supporting more than just the 2.5" and 3.5" form factors. I'm suspect there would be a market for storage devices with twice the volume and 50% or more storage. Or does the 3.5" form factor have something to do with the limitations of spinning rust tech?
>>does the 3.5" form factor have something to do with the limitations of spinning rust tech?
There are issues. As the platters get bigger they get wobbly, more probe to vibrations. The difference in platter speed between the inner and outer 'rings' of data also increases. There is no hard reason for 3.5 over 4.5 or 4.8 inches, but the advantages of going bigger do not outweigh the practicalities of trying to push a new form factor for a commodity product.
>> spinning rust
This is a common refrain, but seagate and the others are proving that HDD tech is not 'rusting'. With lower per-TB costs, and phenomenal reliability numbers (<1% chance of failure per year) HDDs are going to be around for a while. Outside of the datacenter, most users are more limited by more by network speeds than storage speeds.
Isn't it a little hard to accept that it isn't a value judgement when we generally have a lower value of things that have rusted?
I think the original comparison to 'spinning rust' was tongue-in-cheek and not meant to be taken so-very-seriously, but it's kind of taken on a life of its own and, personally, I feel that it's gone a bit far and undeservedly undervalues hard drives.
The earliest use I could find is more inline with my observations of matter-of-fact usage than a joke taken too far:
"Logically, if you utilize a 'memory' disk you are ridding yourself of the physical limitations of disk drive technology — basically trading in the spinning rust. In a disk drive, you are battling physics to squeeze more performance out of your array; there are certain physical properties that just can’t be altered on a whim." - 2004
Datacenters in particular are not interested in increasing heat or power usage. 5.25" hard drives used to be common, but were phased out sometime around Y2K.
> I'm suspect there would be a market for storage devices with twice the volume and 50% or more storage. Or does the 3.5" form factor have something to do with the limitations of spinning rust tech?
There might be issues with Bigfoot-like drives with platter stability at the edges, but I have no idea if this is actually the case. What I do know is you get better performance towards the edge of the platters, and for larger drives, the variability in throughput might be a downside for some buyers. Seeks times are also slower.
There was a variant of the WD Raptor that took the opposite approach: it put a 2.5" 10k RPM drive in a 3.5" enclosure, but this drive played the role SSDs do today; they weren't for mass storage.
> storage devices with twice the volume and 50% or more storage
More the opposite, really - same capacity in lower volume. The people who buy most disks are a lot less concerned with capacity per disk than capacity per rack. Sometimes capacity per kW or capacity per BTU, but mostly capacity per m^3 in my direct experience. I also suspect that a lot of consumers would prefer the same capacity in a smaller package, since they mostly boot off flash and their disk is basically a cache for much larger amounts of data elsewhere.
>I'm suspect there would be a market for storage devices with twice the volume and 50% or more storage.
With your extreme example, I doubt it. Already you can get two drives of 75% of the capacity for just a bit more money, and that would perform far better than one drive of twice the volume.
I wouldn't look at current drive economics to determine that a slight increase in platter size wouldn't enable more storage at a lower incremental cost.
The current form factors are baked in, how would we know what the free variables are?
I think their example of 50% more storage at 2x the volume would need to be a fair amount cheaper than the comparison drive to be considered. I'm unsure what the economics would be, but even at the same price as the comparison drive there wouldn't be a ton of interest.
I may be completely wrong, so feel free to correct me, but isn't that largely from cooling? I'm unclear how much larger hard drives would help, I'm not sure if it's obvious either.
One thing I've always wondered, and somebody on here must know the answer :) When a HDD is manufactured, how are the disk platters initially seeded? Are they just liberally splattered with material & the tracks are created on first write, or are the tracks pre-created? If so, is it a stamping method, akin to making a vinyl record or CD?
- sync (the regular lines)
- positional track number, gray encoding (irregular lines)
- track fine tune (the block trio)
- fine track ID
After fine track ID is read correctly, the position is deemed correct, the RX head is turned off, and the TX head is turned on. This is the rx-tx gap. Then track data starts.
It's amazing that the storage capacity has advanced so much. It's been largely keeping up the Moore's Law. Putting a dozen of these in an array and we're in petabyte territory. Wonder whether the 64-bit file systems are coming up a cliff in 10 years?
Ha. It's fun to throw these estimates around. While 16EB is a lot, let's see where a back of envelope estimate gets us. Assume that storage mostly follows Moore's law of doubling capacity in every 18 months. That is 2^6.667=97 in 10 years; let's say about 100X capacity in 10 years.
So in 10 years we will have 100x120TB = 12PB per disk. Put 8 of these disks in a 2-U unit and we'll have a 100PB box. Stack 10 of these units together in a rack and we'll get an 1EB storage rack.
Remember some file systems use couple bits off the 64-bit addressing space for other purposes. Shaving 4 bits off and we get 1.15EB. That's within the 10-year storage capacity estimate.
While present day Youtude might not have that much data, I'm sure when the 16K, 32K, and 64K videos come along the space will be filled up. Let's not forget the voluminous 3-D Lidar scan data generated by every iPhone when people start recording 3-D VR-Holodeck from their phones. Besides wouldn't it be cool to host an entire copy of the Youtude in your closet? ;)
> While present day Youtude might not have that much data, I'm sure when the 16K, 32K, and 64K videos come along the space will be filled up.
I can't imagine how long it will be until we see 16K, considering 4K caught on in around 2014/15. 8K video exists, but is still exceptionally rare, at least for consumers.
> Besides wouldn't it be cool to host an entire copy of the Youtude in your closet? ;)
As long as you're not still hamstrung by Xfinity's 1 TB data cap...
How about some napkin math on the cost of that data overage? hah
I've had way too many disk failures with Seagate. I won't buy from them ever again. A number of PC building forums have it as a sticky: Don't rely on Seagate.
Enterprise Seagates >4TB are fine now. Most problems in the past were with 3 and 4TB models. They seem to have fixed the problem and new bigger drives seem OK.
It changes as the years go by, but for the last decade I've not had problems with WD.
Of course, a failure once in a while is expected. With Seagate I had a string of failures over a few years. I tend to think that randomness was not alone in explaining it (or I was really unlucky).
More read/write heads (not necessarily more actuators). Maybe dozen heads suspended from a radially movable bridge above the platter so they can read and write simultaneously on different tracks.
How reliable are they going to be? I have heard RAM isn't as reliable as it used to be due to the ridiculous densities we've gotten them to. Same for HDDs.
At what point do we stop because reliability has gotten unacceptably low?
When you compare for price, they are not that slower.
I have 3 1TB WD Blacks that reach 300 mb/s, when I was shopping for my drives, similar SSDs were either crazy expensive (10 times are much for example) or they were tiny, or the ones with "use-able" size and reasonable price, were SLOWER than the WD Blacks in sequential writes and reads (they were still faster in random access, for obvious physics reasons).
To be honest I am perfectly happy with my WD Blacks and never felt the need to switch to SSD, usually my speed bottlenecks are somewhere else (often network speed! it is surprinsingly hard to find good network hardware, all I find is random cables, adapters and routers, that shops don't even know what their speed limits are, asking if a cable supports gigabit ethernet just get me confused looks)
EDIT: This post score is fluctuating, suggesting to me people are sometimes downvoting it. I wonder why... I mean, I posted some anedacta, but why the downvote? There is not even anything to disagree with on my post, unless I wrote something very wrong, but nobody replied to say what is wrong...
The WD SN850 NVMe flash drive does 7000mb/s reads. (Close to maxing out a PCIe 4.0 4x slot) That's 23 times faster than the hard drive speed you quoted. SATA SSDs have been obsolete for a decade now.
Edit: Just checked some prices on Newegg. Looks like the XPG GAMMIX S50 goes for $139, and does 3900 MB/s reads, while SAMSUNG 860 Pro does 560 MB/s reads... and costs $179.
It is unfortunate that the invisible hand does not seem to care about drive speeds. Absent price signals, online retailers do not do a terribly good job signposting the enormous gulf in performance between SATA and PCIe. Newegg lists 999+ 2.5" SATA SSDs, and there is no reason at all to buy any of them when building a new computer.
And because large part of price-sensitive market (consumer/small business) does not care for NVMe speeds, SATA SSD aren't really obsolete. They are just older, slower tech.
SATA SSDs deployments are cheaper than NVMe deployments and ofter fast enough.
EDIT: you can't directly compare Samsung Pro with gaming products such as XPG GAMMIX. They are different classes of product. You wouldn't put ADATA XPG in a 24/7 server for important services.
I guess an even higher level of redundancy will be required at these capacities.