Aw, c'mon chaps! Can we just admire the tech for a second?
> Data on the tape is stored at a record-breaking density of 317 gigabytes per square inch...
> When tape is being read it is streamed over the head at a speed of about 15 km/h and with our new servo technologies we are still able to position the tape head with an accuracy that is about 1.5 times the width of a DNA molecule.
In a world where we hold people accountable for stupid things they did or said when they were teenagers, I can see a lot of blackmail value in retaining data for a much longer period of time.
HDDs are fine for short term storage, but they are too unreliable when you want to keep the data for many years, possibly for a lifetime.
Unfortunately, currently there is no other commercially available method of archival storage, except magnetic tapes. Optical storage has a too low density to be able to compete with magnetic tapes.
That presumes you're putting the data in cold storage somewhere. For data that's being kept accessible, the reliability of a hard drive doesn't matter. It's transferred from RAID to RAID over time. And spy data is probably in warm storage.
Except for videos, that doesn't take up a lot of space. The oppressive part is tracking everywhere you go and everything you say, which fits easily into warm storage.
For example, storing your position every 20 seconds might take 10KB a day. You'll collect 15 million data points in a decade, but each one is only a few bytes.
> The Utah Data Center (UDC), also known as the Intelligence Community Comprehensive National Cybersecurity Initiative Data Center, is a data storage facility for the United States Intelligence Community that is designed to store data estimated to be on the order of exabytes or larger. Its purpose is to support the Comprehensive National Cybersecurity Initiative (CNCI), though its precise mission is classified. The National Security Agency (NSA) leads operations at the facility as the executive agent for the Director of National Intelligence.
It has never really entered the public consciousness since it's not a consumer-facing technology but tape development has continued apace. LTO-9 should be hitting the market very soon and supports up to 45TB per tape (compressed capacity, 18TB raw).
Not quite sure where IBM's numbers here come from, their previous numbers don't match up to the progression of the LTO tape series' capacity. Maybe they are citing "research numbers" that they can do in a lab but aren't production-ready yet. I would certainly assume they are citing "compressed" data figured there.
But certainly tape has continued to progress much faster than most people would have imagined. Big tape libraries are still a thing in certain environments and they work very well, there is no better solution for bulk cold storage.
> LTO-9 should be hitting the market very soon and supports up to 45TB per tape (compressed capacity, 18TB raw).
LTO is cool and all but is the "compressed capacity" number really something to repeat with a straight face? The tape holds 18TB, we don't need to pretend it's anything else.
> But certainly tape has continued to progress much faster than most people would have imagined.
Mostly it has. But I'm somewhat worried about the future after the sudden late-game announcement that LTO-9 would have a 50% capacity improvement instead of the usual doubling.
> LTO is cool and all but is the "compressed capacity" number really something to repeat with a straight face?
No, but it's been the standard in the tape industry for decades. Probably dates back to the first tape controllers that had builtin compression (so compression didn't tax the main cpu).
I am trying to find something what would be make compression done by the tape controller favorable. Maybe it somehow makes the recovery more fault tolerant in the long run? Because it knows about the intricacies of the medium. Just guessing, I know nothing about tape storage
My first LTO drive installation was on a multi user SGI CAD application server. This system did all the compute and data management for roughly 30 users. I/O streaming was easy and efficient.
IRIX allowed for live file system backups and the drive doing compression meant all that happening with negligible user performance impact.
Was literally set it and forget it, aside from tape rotation into off site storage.
Compression would have had an impact.
We don't do multi user app serving much today, so maybe a smart drive has less benefit. But it mattered then. 2000's era.
The announcement [1] is of lab results, so yes "research numbers". A commercial product might be years out (if ever developed). This is not meant to belittle the achievement (which is awesome), but to clarify what has been done and what to expect.
That transport speed has to just be for rewinding and fast-forwarding. If the terabyte you want is the 580th terabyte, you need a quick way to skip past terabytes 1 through 579.
The hardware is not going to read 300 Gb/in density at bicycle speeds. :)
The density of flash memory is competitive with magnetic tapes, but the retention time is too low, making flash memory completely unusable for archival storage, even if it would have been as cheap as magnetic tape.
In theory, write-once memory cards, using some kind of antifuses, could be designed to have a lifetime good enough for archival storage, but nobody has attempted to develop such a technology, because it is not clear if there would be a market for them.
Most people do not think far ahead in the future, so they do not care much about archival data storage, until it is too late and the information had already been lost.
> The density of flash memory is competitive with magnetic tapes, but the retention time is too low, making flash memory completely unusable for archival storage, even if it would have been as cheap as magnetic tape.
I disagree that it's unusable. You'd end up with a puck the size of a data tape that can archive a petabyte of data and needs to be plugged in to a 5 watt power supply for long term storage. That's not super onerous. Then consider that tapes need to be stored at exactly room temperature with 20-50 percent humidity, while this puck would barely care about environment at all. And you could plug it directly into a computer without a $5k drive. Honestly it sounds pretty good to me. We just need to drop the price of flash by a factor of 20 to make the scenario happen.
It probably references the density within the data tape world, which is significant as there could be other ways to achieve a higher total storage, but this is one of the major components here it seems
Tapes are often structured in bands, and those bands are divided into wraps, and those wraps are divided into tracks. There are many tracks on a tape, and they snake back and forth from end to end on the wrap (so you don't need to "rewind" when you get to the end of the tape—you just start reading back in the other direction). In newer tape drives, you physically can't read all of the data at once because the tape head is only a fraction of the width of a single band: it physically moves (laterally) to position itself over the right data.
Since I had to look up more info to understand this explanation, I'll try to give my own, using the numbers for LTO-8.
The drive has 32 heads, and reads/writes 32 tracks at a time. It goes from the start of the tape to the end, then aligns with the next 32 tracks and reverses direction.
Each group of 32 tracks is called a wrap. There are 208 wraps, so 6656 total data tracks. Even wraps go one direction, and odd wraps go the other direction.
That's the important part.
But also the tape is divided into 4 "bands", each one holding a quarter of the wraps/tracks. Between the bands, and at the edges of the tape, are special servo tracks that are used for alignment.
So when a source talks about "wraps per band", it's pointless abstraction. Unless you're really in the weeds, the only thing you want to know is the total number of wraps.
> When tape is being read it is streamed over the head at a speed of about 15 km/h...
It did say that. I wonder if there's some multi-track/double-layer/double-sided stuff going on that needs multiple passes to read, that does sound awful fast!
Most recent tape standards have multiple "bands" and "wraps" placed in parallel. The head reads only one wrap at a time, so it takes many passes to read the whole tape. For example an LTO-8 tape has 52 wraps each within 4 bands, requiring 208 passes to read completely.
It would be like downloading a file using GetRight back in the day, where you'd have one thread downloading the 0-25% chunk, one downloading the 25-50% chunk, and so on.
You could hypothetically do that, for sure, but the software basically is derived from the tape era where you'd just have one logical stream coming out of the tape.
Positioning the tape should become harder, I suppose.
Also, I suppose they limit the read bandwidth with something like what a single Infiniband connection would support; few disk arrays support much higher speeds.
The numbers they cite for "previous generations" don't match up to the progression of the LTO tape series' capacity. Maybe they are citing "research numbers" that they can do in a lab but aren't production-ready yet. I would certainly assume they are citing "compressed" data figured there.
Also bear in mind that tapes typically store data striped across the tape in multiple tracks and multiple bands. There are four bands per tape and 12-52 wraps per band, so reading the whole tape requires up to 208 passes across the tape.
But yes, to agree with another parallel comment, tape data rates are quite high sequentially (abysmal in random of course, but that's not how they're used). LTO-8 does 750 megabytes per second compressed / 360 megabytes per second raw.
> Data on the tape is stored at a record-breaking density of 317 gigabytes per square inch...
The table from IBM[1] in the middle of the page says this number is in gigabits (not gigabytes) per square inch. This is obviously impressive nonetheless, but I wonder what else this article got wrong.
> record-breaking density of 317 gigabytes per square inch...
Yawn on the raw density figure, though.
The chip inside a 128 gigabyte micro SD card is a small fraction of a square inch and is cunning enough to provide random access.
You just can't easily and cheaply have a long tape of them.
It used to be the case once upon a time that mass storage media like magnetic tapes and discs based on writing on surfaces with a head had better density than memory chips.
That is roughly 500 bits per square micrometer or 2000 square nanometers per bit or an average (assuming square lattice) distance of 50 nanometers per bit.
It isn't mentioned and I don't expect much other then inertia of the device itself. There is a servo mechanism assuring the head follows the tracks on tape. Improvements on that are mentioned. I don't think it matters whether the tape (necessarily) or the environment vibrates or whether that distinction is meaningful.
If you're flying back and forth on that route that often, you're going to get upgraded quickly, I would imagine. I've done 6 flights a month domestic and have been upgraded to first class for free fairly often.
>Otherwise that sounds like a nightmare that would require an insanely high paycheck.
If the courier in question is 6'2" that might be the case, but I suspect if the person was much smaller and is a heavy sleeper, that it wouldn't be too bad.
As a person of 6’0” I found the flight from LAX to SYD, and return, miserable in coach. I slept a lot and watched some movies but any way you cut it that is a long time to be sitting in a very uncomfortable seat.
1.75m here and coach sucks for me too. For that reason and social distancing I just bought a car that can drive me to places I’d usually fly, in much more comfort.
It's an ideal job if all you need is hours of concentration, reading, and writing or drawing. A writer, a PhD student with a laptop, a comic book author, etc.
Or, maybe, even a Zen monk who spends time meditating.
Still sounds like an awful job to me flying that much.
Not sure why people said it was a good thing to do. You'd be bored of the process within a couple months and hating planes and airport... unless you're a certain type of person.
A lot of traveling sales guys have talked about this.
I wonder if tapes are more prone to some form of driveby attack, whereby instead of requiring physical or remote access to a location, a strong enough magnetic field within a certain distance of a datacentre could penetrate bricks and mortar and render them useless
I'm envisioning a huge device in the back of a van which pulses a powerful beam, similar to typical movies (Oceans 11?) cutting the power to a bank / casino prior to a raid
Unattended for 14+ hours you mean. A baggage handler on the departing side could be bribed to steal it and you would have half a day of unfettered access before the courier on the plane even knew it was missing.
Of course you don’t check it if it’s that critical.
Apologies for all the questions, I'm just curious about this.
>Checksums will identify something went wrong, and then you need to redownload the file to a quarantine network and scan it. Takes time.
Surely a sensible file transfer algorithm would compute checksums on small and easy-to-retransmit chunks? Does rsync not do this? Isn't it already happening in TCP?
>Not to mention most major studios will contractually prevent you from exposing anything to the web.
I understand that workstations with media on them are not going to have internet access, but do they really prohibit site to site VPNs?
>Amazon just released a device (forget the name) for exactly the same use case. We developed it in house.
Snowball and Snowmobile? IIRC these are primarily meant for one-time migration from on-prem storage to S3. Do people really use them on an ongoing basis?
I'm more interested in the lifetime of the medium. Durable backup media for consumers is still a holy grail as I understand. M-DISC didn't hold up to their 1000 year lifetime promise. Archival-grade DVD's are also not good enough as I understand. Syylex went bankrupt. I want a consumer-grade backup medium that can provide at least a 100 years of lifetime.
That said, I was able to recover 90% of the voice recordings my father made between 1959-1963 on reel-to-reel tapes 60 years later. Tape can be very durable but what I recovered was analog voice very tolerant of errors. I'm not so sure about gigabytes snuck into an inch-square.
Brings back memories of a great remix contest in 2006 where the digital copies of analog tape tracks from Peter Gabriel’s “Shock the Monkey” were made available to remixers.
To my surprise the pitch of the samples was a little lower (and varying ever so slightly over the duration of the song) than what you’d have expected with a440 tuning. It baffled me, since I expected some of the early digital synths used in the original sessions should habe been rock solid 440 tuning.
And that’s how I learned about “tape stretch” where analog audio tape stretches just enough to make the pitch of everything a few cents lower over long period of time.
p.s. I ended up applying digital pitch correction, so I could “jam along” with my own synths :-)
A different problem happened to me when digitizing my dad's tapes. Dad bought the player in USA and brought it to Turkey and made the recordings there. When I digitized them in USA, everything sounded in higher pitch. It later turned out that the voltage difference in AC (60hz/50hz) caused the rotation speed to change proportionally, so I slowed it down to 5/6, and it was perfect afterwards.
Ask HN: How can I create the most reliable and durable NAS today? I have a lot of very sentimental, very-important files, such as family photos and videos. And I simply like to hoard data.
I currently have 8TB of data stored on a Synology DS218+ with RAID1, and monthly data scrubs (verifying checksums). It is backed up remotely to Google Drive (in encrypted form), and I also maintain an infrequently-updated, once-per-quarter disk clone with an external HDD.
My biggest concern with my current setup is that the memory is non-ECC. Even though the files are checksummed, I am concerned that specific memory corruption / bit-flips could propagnate into the checksums, and hence result in data corruption.
I am considering:
* Building my own FreeNAS box using AMD Ryzen (which semi-officially support ECC memory). My concerns here are the semi-official nature of support: how do I know if ECC works, before a rare cosmic bit-flip?
* Purchasing a Synology DS1621+. This is AUD$1400 which is a tough pill to swallow, for the equivalent of a first-gen Ryzen quad core and 4GB of memory.
You’ll know ECC works when it matters; When you encounter a flipped bit. With an 8TB RAID, that is likely to be within the next 24 months.
Go with option 1, and Raid-Z2+. With RaidZ2, you’ll be able to not only detect but correct a flipped bit - Even if that flip happens when writing out the data.
Pay attention to the counters. Your ZFS scrubs will report how many resilvers they have. You’re likely to encounter 1 in a scrub. You’re unlikely to encounter more than 2. If you see that, that’s when you check for memory errors. A single bad sector is likely the hard drive. Even a single flipped bit is likely a transient error; It could be your memory, or your disk, or anything in between. It happens at scale, and 8TB, read repeatedly, is a lot of bits.
Look into rasdaemon and memtest86 - They’re the tools you use to debug what happens when
The other advice I can give you: Don’t be paranoid. Your photos are likely to acquire bit rot. You will have dozens, even hundreds of bit flips that will happen in your lifetime. Of the many thousands of photos you will take, the chances that you will ever notice the discoloration or bad line that a bit flip will have in a photo are pretty small. Bit rot happens. Your photos are important to you, and you should treasure them, but treasure them for what they are: Things that you protect and are under your care, not things that must be twenty nines of correct. You can realistically achieve 10, even 12 nines of correct reads on your data. You don’t need more.
"Next, you read a copy of the same block – this copy might be a redundant copy, or it might be reconstructed from parity, depending on your topology. The redundant copy is easy to visualize – you literally stored another copy of the block on another disk. Now, if your evil RAM leaves this block alone, ZFS will see that the second copy matches its checksum, and so it will overwrite the first block with the same data it had originally – no data was lost here, just a few wasted disk cycles. OK. But what if your evil RAM flips a bit in the second copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted. A later scrub will attempt to read all copies of that block and validate them just as though the error had never happened, and if this time either copy passes, the error will be cleared and the block will be marked valid again (with any copies that don’t pass validation being overwritten from the one that did)."
(dont just read the quote, read the link)
I am using ZFS since 2007/2008 and I have never had any issues (except with those damn Seagate 3tb DeathStar hdds, where I was barely able to replace them fast enough - in 3 months 3 were gone - i will never buy seagate again)
I am having a asus microatx board with 16gb of non ecc ram, additional SAS HBA and 3x3Tb Toshiba into zraid and additional 10TB HGST He3 disks + Ultrium 3000 (LTO-5, they are quite cheap today and are designed for 24/7 operation which is barely what I will ever encounter while tapes can be restored on latest LTOs if needed, while you can get cartridges on some sale for peanuts) for backups. There is no way in hell to go for my important data (like images) to disk only and tape is nice. You take the cartridge and store it at parents/gf/worspace drawer/...
Anyway if I remembered correctly, google has lost 150k of user accounts in 201x and restored them from tapes. So even for cloud opinionated people, it still makes sense to shovel important data to tapes if you dont use the data in everyday processing (and just an info - even shelved disks die)
Write the really important data for long-term storage to MDISC BD XL media (100GB each).
If it's pictures you care about, i'm sure it's far less than 8TB that need the VIP treatment
I would love your part list! Did you go with ECC memory? Did you have any way of verifying the ECC is working and actually detecting/correcting bitflips?
The other thing I am interested in is minimizing idle power consumption. Just to be more environmentally friendly.
I don't know if you want to build the same kind of system but at least you can get a list of parts that work together.
I use my Truenas box for storage using ZFS, VMs and NFS server for different PCs.
I bought ECC memory as I understand this is more or less a requirement for ZFS.
I found out that FreeBSD which Truenas is based on can give you info about what type of RAM is present.
The command is:
# dmidecode -t memory
According to this I have ECC RAM. :)
I did a build with the following parts:
Case: SST-CS380 V2 (space for 8 3.5" drives, 2 x ).
Mainboard: ASRock X470D4U2-2T
Power supply: Seasonic Focus Plus 550W Gold 80 Plus Full Modular Power Supply
RAM: Kingston Server Premier, DDR4, 16 GB, KSM26ED8/5 M
CPU: AMD Ryzen 5 3600X Wraith Spire CPU
Cooler: Arctic Liquid Freezer II
NVMe to PCI bridge: ASUS Hyper M.2 x16 Gen 4 (PCIe 4.0/3.0) supports 4X NVMe M.2 devices (2242/2260/2280/22110) up to 256 Gbps for AMD TRX40 / X570 PCIe 4.0 NVMe RAID and Intel® RAID platform - CPU features
I bought all this from Amazon.de. The main board is a server main board with 10 GE Ethernet and a console for flashing bios and remoting into the machine - no graphics card needed. This was expensive and you can most likely save a lot using a consumer grade main board.
Be careful not to buy an AMD APU - this doesn't support ECC RAM for some insane reason. An APU would have build in graphics and the CPU.
I use both SATA drives (long term storage) and SDD drives (for speed).
I created two ZFS volumes (I don't remember the proper terms). One for rotating discs (which could sleep most of the time) and one using SDDs for fast storage which doesn't use much power.
I have 6 KW solar cells with battery so I don't really care if the box uses a lot of power. During daylight its more or less free when sun is shining. I get next to nothing when selling the electricity that I generate and would like to use as much as possible locally.
There really isn't much market for it. You can pay Google or Apple or one of the large cloud providers a very reasonable and decreasing rate for a literal guarantee that your data is accessible. The only risk is the company goes under, which is extraordinarily unlikely for someone like Google / Apple and the shutdown would have advance notice.
I realize for the hacker news audience there are multitudes of reasons the solution above doesn't fit your needs, but realize the consumer market is near nonexistent.
If nobody is "inheriting" your data (or rather—nobody cares enough to keep your data around), it seems kind of moot to ensure it hangs around. That is, if I put stuff in an S3 bucket and pre-pay for 100 years, if nobody is around to download it in 100 years then why bother?
If you wanted to make a sort of digital time capsule and didn't care who discovered it, your next best bet would probably be the Internet Archive or some other archival community.
If your data isn't appropriate for archival (i.e., can't be publicly consumed) and isn't interesting enough for your friends/family/etc. to keep around on your behalf, keeping the data is purposeless.
I absolutely take inheritance into account when having backups lasting for a hundred years, but regardless of how uninteresting my data looks, we don't know if today's boring data would be invaluable for science in the future. We show slippers from 5000 years ago in museums today and they're invaluable. Consider the person who owned it, walking on a national treasure, unaware. Maybe, they didn't even like the slippers, found them boring. :)
I was thinking that DNA is a pretty robust storage medium. Perhaps we could use it in coming years to store data for long term survival.
Though considering these comments and the advent of mRNA/CRISPR, perhaps we could store data for future generations in our own DNA. That'd be fascinating if you could read journals or even audio/video of your ancestors from your biological inheritance from them. What if we could engineer an extra chromosome to do just that, then let them remix and recombine segments of memories so everyone's would be unique.
Just store your diaries in a line of yeast that produces tasty beer or wine. That could work. I wonder what the oldest yeast lines in use today are, and how stable their genomes are.
Or if you really want your data to survive, engineer it into a virus for your local species of cockroaches! Getting the data back could be gross, but it'll survive nuclear holocaust. ;)
The importance of those slippers is tied strongly to their rarity. So little survived from 5000 years ago that almost anything from that time is valuable.
By comparison, we'll create more data in the next ten minutes than entire centuries from our relatively recent history. Lots of stuff is getting preserved in lots of places with substantive redundancy for virtually nothing. Your slippers today are likely to be more valuable than the near-infinite troves of documents and photos and whatever else.
I agree consumer market doesn't exist because everyone is seduced by the cloud nowadays. Having said that supposed guarantees provided by these companies should not translate into blind and complete trust.
A single bad bug or security issue can make data inaccessible or corrupt. Tapes on the other hand does not have that issue. IMHO trusting all your important data to a single vendor or technology is a recipe for disaster.
The lifetime of the medium is half the equation. The biggest problem imho is the lifetime of the device. Say I give you an old floppy drive for 8" floppies from the seventies. Where would you connect it?
You're right but not all media are the same. Arctic Vault used an optical film with QR codes. You can theoretically even take its photo and decode it by hand if you want. They even added a Rosetta Stone to the entrance so, even if all the knowledge is lost, one can hypothetically decode the data stored there. For magnetic media, you need some more specialized equipment for sure.
Floppies are an interesting case because the protocols and physical specifications are all documented publicly, which means that one could literally build a drive from scratch today --- the trickiest part being the heads, but considering that they are many orders of magnitude lower density than HDDs, it would not be a big obstacle in the future.
(I believe 8" floppy drives have the same interface as 5.25" ones --- and there's no shortage of adapters from the retrocomputing community for those, some of them even open-source.)
Tape is far more closed, AFAIK most of the common formats are proprietary and the specs are behind NDAs and other walls.
I have hobby books from 30 years ago that teach you how to build magnetic heads for cassette tapes. No pictures or anything but with enough patience you definitely can build one today, at home (even then). Mind you, the size of the gap is not that big of the problem if you put your mind at polishing.
Tapes are uniquely terrible at this. I'd argue it's the Achilles' heel of the medium, even moreso than the actual limitations of linear tape.
First off, the drives themselves are going to be expensive brand-new. You're going to be paying thousands of dollars for a drive, and it's probably going to have some weird interconnect that you'll have to spend even more money and waste a PCIe slot on an adapter in order to use the drive. Most common is SAS or Fibre Channel; although there's at least one company selling drives in Thunderbolt enclosures for the Mac market.
(Aside from all that, SAS is actually pretty cool for things like hard drive enclosures, since it has SATA compatibility. I have a jury-rigged disk shelf built out of an old HP SAS expander, a slightly-modified CD duplicator case, and some 5 1/4" hard drive bays.)
Second, tape formats are constantly moving. LTO and IBM 3592 come out with a new format every 2-3 years and backwards compatibility is limited. Generally speaking you can only write one generation behind on LTO and read two generations back. So, if you want a format that's got drives still being made for it, you'll need to migrate every 5-7 years. Sure, the actual tape is shelf-stable for longer, but you're going to be buying spare drives or jumping on eBay if you want to keep old tapes around that long.
(eBay is actually not a bad place to buy used tape drives, but the pricing varies wildly. It's perfect for hobbyists and small-fry IT outfits looking for cheap backup media. Absolutely terrible if you're a large outfit with reliability guarantees and support agreements to maintain.)
Third, actually using a tape drive is a nightmare. First off, Windows hasn't shipped tape software since 2003 (I think?), so you'll be in the market for proprietary backup solutions. Second, if you're just writing data directly to the tape, you will shoe-shine for days. Common IT practice is to actually have a second disk array sitting in front of the tapes as a write cache and custom software to copy data to the tapes at full speed once all the slow nonsequential IO is done. Reading from tape doesn't have to worry about this as much, but the fact that you had to use custom software just to archive your files means that you now have proprietary archive formats to deal with. So you can have tapes that rely on both having access to working drives and licenses for proprietary backup utilities.
(Of course, if you had decently fast SSDs and a parallel archival utility, you could sustain decent write speeds on tape. I actually wrote this myself as an experiment: https://github.com/kmeisthax/rapidtar and it can saturate the LTO-5 drive I tested this with.)
That's probably not going to happen. Tapes have high longevities and you can buy used LTO drives on eBay for a few hundred bucks, but the biggest issue in 100 years is going to be finding a device to read the tape, and finding an adapter to hook it to the USB-H 12 quantum optical port on your motherboard.
A better way would be to use something like that to store it for a decade or two, then copy the data onto whatever the newer version of archive medium is (LTO drives can typically read the last one or two versions as well). Rinse and repeat every decade, and it also lets you test if there has been any bitrot.
> The objective of this study was to investigate the behavior of the GlassMasterDisc of Syylex under extreme climatic conditions (90°C and 85% relative humidity) and to demonstrate the potential of this technology for digital archiving.
> The result of this study is that the GlassMasterDisc has a much longer lifetime in accelerated aging than other available DVD±R
I wouldn't draw any other conclusions on normal ageing of other tested media. They did an accelerated aging test at 90°C and 85% RH, where most discs didn't last a single test cycle (of 10 days), two discs lasted a single cycle, and only syylex lasted all 4 cycles.
Quote on a brand-name DVD
> This DVD model had the longest lifetime (i.e. 1500h) at 80°C and 85% RH. At 90°C, it is destroyed after the first cycle of 250 hours.
For an idea of what it does to the substrate:
> [for measurement] DVDs have to be taken [out] .. To prevent the formation of water droplets in the polycarbonate, it is necessary to "purify" the polycarbonate from the water that was absorbed at high temperature.
OTOH, I had CDs (Verbatim, upper middle class), of which about 1-2 of 50 had issues after 20 years storage (in dark, mostly room-temperature conditions).
Yes, they did an accelerated test, and M-DISC has performed as good as archival grade DVDs. Syylex, which promises the same lifetime as M-DISC, performed significantly better. That clearly shows either M-DISC didn't live up to their promise, or Syylex and archival grade DVDs surpassed expectations. Either is bad news for M-DISC, isn't it? What am I missing?
The "accelarated test" may not be in any way indicative of true lifetime in moderate conditions. Their own conclusion does not draw any such implications, the only other test they reference is done at 80°C (10°C lower), and the only writing on how or why this test could be indicative of archival lifetime was a generic two sentence: harsher conditions -> faster degradation (in part 4).
It was a pupose-built test to see how much of X would Syylex take. It took X better than others, none of which took X well. Tests like these are very good, if you want to go with Syylex, to make sure it's not worse in some way (X, or Y, Z), which would then suggest a need for further examination. In real aging, factor X may be completely meaningless, while Y and Z are crucial, so you cannot conclude which one will last more.
Why test 90°C and 85% RH, not 80°C, 50°C or 110°C, or bending, UV light, scratching, drop in acid .. whatever? For a proper accelerated lifetime test, you would need to identify (all) relevant degradation modes and model their behaviour (and interaction) in target vs. accelarated test conditions, and then extrapolate behaviour in target conditions. They didn't even write what type of degradation they are testing.
I'm not convinced. I'm no expert on aging simulation however. But heating a DVD to 90°C seems like it would do different things to the disc than normal aging at recommended temperatures, wouldn't it?
Given the pace at which storage capacity increases, what is the rationale for not copying over your data every 5-10 years onto the next cheapest consumer mass storage of the moment? You get all your data in one place and you don’t have to deal with standards disappearing (I read 2020’s game consoles can’t read CDs anymore, people should rip their CD collection right now).
The bookkeeping it requires for one, since you don't usually buy all your backup media at once, you acquire them over time, that gets unnecessarily complicated. It's riskier to copy the media periodically as you might increase the chances of data corruption due to the fault in the copying process (faulty RAM, faulty software, not concentrating good enough etc). You periodically introduce possibility of user/hardware/software errors to the longevity of your backups.
Also, when others inherit the media they may not have proper equipment or skill to do it themselves as goal of the preservation is to get it 100 years ahead, not keep it always in a usable state per se. For example, I'd like my children to keep my backups until my grandchildren could access them 60 years later.
For data corruption and mis-manipulation, I would be more concerned about the long term decay of any media than some bit flipping in RAM (even for tapes as their endurance relies on certain storage conditions, but it is likely to be a hard drive, writable DVD/Bluray or something flash based, these do not particularly age well).
For book keeping, I think my point is that storage media are becoming so big that you always consolidate into a single device every time you carry the data over (you may still want to duplicate for reliability). Like you can buy a 18TB hard drive today. A consumer isn't going to require more than one or perhaps two of those for anything to be preserved long term. And in 5y-10y, you will likely have 25-30TB hard drives.
The equipment problem is precisely what this addresses. You are always using the latest hardware, and the previous hardware you are using is still supported if you stick to a 5-10y cycle. For instance you would have moved away from IDE drives while you could still find motherboards with both IDE and SATA ports. But if your data is stored on an IDE drive, good luck connecting it on a computer in 2030 (if we haven't moved full Apple's "you can't customize your hardware and we deprecate everything very frequently").
Skills (and I would say mostly dedication) is still a problem. But we are talking about copy-pasting files between two media, it's not rocket science even if you don't script it.
Every 10 to 15 years, you send in your archived (and new/interim) personal data and get it back on the current top-tech storage medium. That way it's not stored in the cloud and you can keep moving the stored data forward without having to deal with it all yourself.
Not really. You can buy a 18TB hard drive now. Even if your data is humongous and needs several of those, it will likely fit on a single drive in 5-10y. So it takes an increasingly smaller amount of time to replicate (excluding the copying time which keeps the machine busy but not you).
Use https://en.wikipedia.org/wiki/Parchive and add as much redundancy as you like. A lot cheaper to over-provision, than to create the uber-archive medium.
Hell, tell me it will last 20 years and I'm okay with that, if you can guarantee at least 15 years I can then buy replacements every decade and transfer files over...
That's right, but regardless of how "extreme" testing is, archival grade DVDs have performed as good as M-DISC, and Syylex has surpassed the rest by a huge margin. Syylex promised the same lifetime as M-DISC (unlike archival grade DVDs). I think the results are good enough to see M-DISC either doesn't live up to the expectations or archival grade DVDs exceed the expectations. Either way, bad news for M-DISC. If Syylex hadn't bankrupted, it would have been the best option of course.
Yes, what I mean is that - set aside the "glass" disc from Syylex - we don't know if both M-DISKs and archival grade DVDs suck or excel, let alone how long they actually last (readable) in the real world.
IF my last guess in comparison is correct, 100,000 hours at "normal" temperature/humidity is roughly 11 years, but it may well be that without "cooking" them at 90 C°, the duration is for both 200,000 hours (or whatever) ...
I remember watching the pilot episode of Star Trek (the original series) and chuckling when Spock reports that “the tapes are badly damaged” from the capsule they recovered near the barrier at the edge of the galaxy.
Turns out it might not be that outdated after all.
About a month ago found a stack of MiniDV tapes from about 15 years ago.
It was my own home videos. I wanted to preserve them, so wanted to upload them to Google Photos.
It took me some looking around in eBay to find a camcorder to play these back. When it came home, I realized that I needed a FireWire 400 port to capture full resolution. So more digging around for a FireWire PCI-E card. I was finally able to transcode and upload some 15GB worth of video. It took 3 minutes on my gigabit internet for the upload. About a month for the whole process acquiring the hardware etc.
When I think about these tapes some 50 years from now, it might as well be completely unrecoverable, not because the tapes went bad, but because we have nothing to read them. Makes you wonder about galactic time scales like in Star Trek.
> found a stack of MiniDV tapes from about 15 years ago ... I wanted to preserve them
Here's where I raise my hand and ask "Why?"
They've been sitting in a box for 15 years. You never watched them, you never even thought about them. Why preserve them?
It's why I stopped taking pictures and videos of things. I never watched them again. It's all just a lot of waste motion over some dream that someday we'll find value in sitting and looking at this old stuff again.
For me it’s personal history. My kids LOVE watching videos of me and my wife when we were younger. Luckily for them, my wife’s family converted all their videos to digital and so it’s easy to watch.
I’m itching to do the same to my parents’ collection so my kids can see more.
For me the biggest thing was that the tapes had been sitting for 15 years, then discovered and treated as though they were something valuable. If they were so important, why were they forgotten on a shelf for 15 years?
Wait until you have to clean out a parent's house with a rooms full of stuff saved because they thought the grandkids might find it interesting. Here's the reality: they don't. My mother saved boxes and boxes and boxes of photos. None of them ever sorted, or put in albums. Just thrown in boxes. Saved for decades. Was it hard to throw them away? Yes, a little. There were a lot of moments of my childhood there. But if I asked myself honestly, was I going to do anything with them other than put them on a shelf at my house? The answer was no. I'd advise sparing your kids that burden.
A friend of my mother's had saved every canceled check she ever wrote. Boxes of them, because she thought her kids or grandkids might be interested in them some day. Their reaction was likely: What's a check?
If you really cherish something, and it regularly brings happiness to your life, by all means save it. But do it for yourself, not for what you think your descendants will find interesting. And if it's been in a closet for 5 years, ask yourself why you are keeping it.
It has nothing to do with intent. It’s tone deaf to ask why someone would want to preserve memories, especially if it’s based on your one anecdote of yourself not wanting them.
It’s like asking why someone would want to spread ashes or preserve an old piece of rickety furniture a long lost relative built.
Yes, but when you read it in context, it's clear that the author is referring to themself:
>It's why I stopped taking pictures and videos of things. I never watched them again. It's all just a lot of waste motion over some dream that someday we'll find value in sitting and looking at this old stuff again.
The author uses "I" over and over again, explaining that they don't see the point, so asking about a different perspective seems genuine to me, or at least there's a plausible interpretation that it is genuine.
From HN guidelines:
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
Great, so when can I buy one and how much will the tape drive cost?
The largest issue with tape is that the drives themselves cost absurd amounts, and you better not cheap out because failure is both time consuming and scary.
Swapping tapes continues to be human intensive and restore times long. But the tapes themselves are so cheap that at this scale it becomes worthwhile again.
This is geared for enterprise settings, not home use. I believe someone mentioned these drives costing $25,000. I do agree reasonably priced tape drives with TB of space for home users would be great.
While tapes are theoretically cool, the the drives are just too rare for them to be of any practical value to a home user. Even if the media has a better archival life than a hard disk (say 50 years), it won't do you a damn bit of good if there are no drives available in 50 years to read it.
IMHO, hard drives are better for backup (even offline backup, just get a hot-swap bay and imagine the drive is tape [1]). Archival is a harder problem, but I've settled on using high-grade optical media, burned slow for fewer errors, making a bunch of redundant copies. Even though the media might not last as long, I'm pretty much guaranteed to be able to find a drive for the next several decades [2].
> Those tapes only hold 200 GB, and you can get a 10 TB hard drive today for less than $300:
That was a link to an obsolete LTO-2 drive. If you’re using LTO-8, which is the current generation, you’d get a 10-pack of 12TB tapes for around $500. Noticeably cheaper per byte than hard drives.
I don’t recommend LTO-2 just because I don’t think that the drives are well-supported.
> While tapes are theoretically cool, the the drives are just too rare for them to be of any practical value to a home user. Even if the media has a better archival life than a hard disk (say 50 years), it won't do you a damn bit of good if there are no drives available in 50 years to read it.
If you’re evaluating just on the basis of the price of drives / media, then there’s a cutoff where tape becomes cheaper than hard drives. The easy way to calculate this cutoff is to divide the overhead cost (tape drive price) by the difference in the cost per TB of tapes and hard drives, which in the example here, is around $3k divided by $25/TB, or 120 TB.
In other words, if you have more than about 120 TB of data, then it is cheaper to buy a tape drive. I think any comments about whether tape drives are suitable for home use are really comments about whether you are interested in the use case of people who need to store >120 TB at home.
If you are running a YouTube channel as a hobby, or you run a side business doing videography for weddings, the tape drive starts to sound a lot better. The 120 TB cutoff might be around 300 hours of video, which might be only 50 events.
There are a lot of other reasons why you might NOT want to use tape, but it’s easy to have enough data that tape is the cheapest storage option. At “enterprise” scale the cost calculus is completely different and involves things like support contracts with Oracle (not necessary for hard drives), power/cooling in your DC (tape is very low-power), etc.
And let’s not forget that if you have 120TB of hard drives, you’re in the regime where you start having to buy multiple machines.
As for longetivity—I don’t have the data handy. If you are storing data on tape, you need to migrate to newer generations of tape, as old generations become obsolete and unavailable. If you are storing on hard disks, you also need to migrate because hard disks eventually fail (even if they are not powered on).
> That was a link to an obsolete LTO-2 drive. If you’re using LTO-8, which is the current generation, you’d get a 10-pack of 12TB tapes for around $500. Noticeably cheaper per byte than hard drives.
> In other words, if you have more than about 120 TB of data, then it is cheaper to buy a tape drive. I think any comments about whether tape drives are suitable for home use are really comments about whether you are interested in the use case of people who need to store >120 TB at home.
> If you are running a YouTube channel as a hobby, or you run a side business doing videography for weddings, the tape drive starts to sound a lot better. The 120 TB cutoff might be around 300 hours of video, which might be only 50 events.
No dispute there, but I think most >120 TB home use cases run into the question of if it's even worth it to keep the data (raw/uncompressed). For most people, the answer is probably no, and curation/compression makes far more sense. For instance, I don't think it's even typical for wedding photographers/videographers to store their final product indefinitely, let alone the raw footage for a re-edit. You can keep a lot more events in a lot less space if you're only keep the raw footage for a few recent events and the 15-30 minute final edit for a year after it's finalized.
> And let’s not forget that if you have 120TB of hard drives, you’re in the regime where you start having to buy multiple machines.
Not if you swap disks as offline storage.
> As for longetivity—I don’t have the data handy. If you are storing data on tape, you need to migrate to newer generations of tape, as old generations become obsolete and unavailable. If you are storing on hard disks, you also need to migrate because hard disks eventually fail (even if they are not powered on).
Honestly, the theoretical longevity is the only aspect of tape that appeals to me, but like you said enterprise tapes will end up being like these enterprise optical disks in a fairly short period of time, so you're regularly going to have migration legwork to do or your data's effectively toast.
You don't even need the absolute latest drives to get reasonable storage volumes. Some careful shopping can net you an LTO-5 (1.5TB native capacity) library unit (one that can be upgraded to newer generations) for under $500, with new tapes $10-15 each.
The nice thing about LTO as a format is its relative predictability and ease of acquiring the parts, even the very obsolete ones. It's all SCSI or SAS, most of the interesting stuff happens at the hardware level, with a bog-standard API. Your average backup app, whether it be Backup Exec or mtx/tar/etc. on Linux doesn't need to care about the media format. Unlike actual "enterprise" shops with datacenters and support contracts and such cruft, where the primary concern is "does it work", it is fine to buy older units second-hand. They are plentiful and cheap.
> You don't even need the absolute latest drives to get reasonable storage volumes. Some careful shopping can net you an LTO-5 (1.5TB native capacity) library unit (one that can be upgraded to newer generations) for under $500, with new tapes $10-15 each.
That's still... pretty terrible. If we compare you costs for a tape drive against a 14TB easystore @ $190[1], the break-even point is around 72TB. I don't know about you, but that's of data you have to store for it to be worth it. Even at 150TB you're only looking at around ~25% (~$500) in savings compared to hard drive, which I don't think is much when you factor in how much of a hassle tape drives are to work with.
It's not a fair comparison to put up hard drives (integrated mechanics) against tape drives (seprated mechanics). They do not solve the same problem and have different longevity profiles. If I'm spending $500 on tape storage, it's because I want something that will last a long time, something that portable hard drives tend to have issues with.
LTO5 onward supports LTFS, which exposes the tape to the OS as if it were any other removable storage, with the one proviso that deleting files doesn't reclaim space unless the entire tape is wiped.
True, on hard drives, the controller board will eventually fail. The solders might decompose or the circuitry might fail. Then you’re left with a platter full of randomized bits.
But on tape, the controller mechanism is separated from the storage medium. The controller mechanism is inside the drive readers itself, which will eventually fail.
So for both, it’s a trade off. They’re both going to eventually fail.
One day I found a really cheap LTO-4 drive that costed $150. Interesting, but I found it was not practical despite the price and decided against it. First, an 800 GB LTO-4 tape is no longer high density by today's standard, it couldn't even hold a 1 TB HDD image, also, I still had to pay for the necessary SAS peripherals to get it working, finally, the mechanical assembly inside a 10-year-old tape drive was not something that inspires confidence for data backup... Last time I've looked up, the cheapest decommissioned LTO-5 drive still costs $1000.
Well they don't really make these things for laptops unfortunately but they also don't cost $25k+ like they used to. It's still more than the average consumer is willing to spend but they are affordable enough now to be a viable option for professionals and small businesses.
Ah - I took the question as saying “I did nothing to indicate the sort and could in fact be purchasing for a business”, so I was trying to explain that side!
That’s how I meant it. Seems odd to say “we” out of context like that. And anyway I would be the person purchasing so including the whole company is.. odd.
Anyway, I guess this is the pedantry we (!) should expect from HN
> Swapping tapes continues to be human intensive and restore times long.
Restore times are definitely long, but you can mostly avoid human labor by putting the tapes in a tape library, which will load tapes in the drives using robots. You still need technicians around because tapes / drives / robots will break, but individual restore operations can be completely automated.
Doesn't really sound like it's aimed at you. Tape is for serious long-term, high-volume storage. If you've got a limited budget for the tape device it's probably not designed for you in the first place.
> I understood the exact opposite that you have understood, and I find your comment quite uncomfortably presumptuous, as a result?
Is this a question? I can't tell you if that's how you find my comment or not, sorry.
They wants a cheap tape device because they just want to use it in the home. Tape devices aren't cheap... because they aren't aimed at use just in the home.
They aren't aiming at home users - I think that's a simple fact not presuming anything.
The parent clarifies above that they are not a home user.
Even if you believed they were a home user, they
clearly demonstrated in their original comment that they did know that it was for long-term, high volume storage, and they did know the cost, and you appeared still to belittle them for not knowing.
It's a question in the sense of - why did you choose to say it in this way, which seemed impolite? What did you intend your comment to add?
We reached an inflection 'amazing!' point when we are able to put '100 songs in your pocket'. That was really a shocking thing given the limitations of CD/Tape etc..
But the real inflection point may come when 'all relevant information in your local disk'. The tapes here in question can maybe store every book every written!
We may be able to put the entirety of Wikipedia, every film, TV Show ever made, every book every lecture on a little disk.
The only thing we'd need to access in realtime would be contemporary data like traffic flows, weather situation etc..
The ability to store 'that much data quickly and easily' locally, may quite fundamentally change the equilibrium we have right now with the cloud and lead to a more natural decentralization.
"All of YouTube from 2008 until the present, 99 cents at the local gas station, on usb-like contraption"
This will never, ever happen. English Wikipedia (w/o pictures) copressed is measly 20 GB. It is hard to quantify "all books ever written", but I have kept copies of some online libraries large enough that they for sure have pretty much every book you can remember and 10 000 you never heard of for every single book you can remember. It's not that much, you can fit it on 1 or 2 regular HDDs.
Now, I did it because I'm that type of guy. There's not that many people who actually do this bullshit, even though it's perfectly doable.
So why don't they? Because it doesn't make much sense, if you aren't afraid of upcoming nuclear winter. Wikipedia is updated and improved every day. You only sometimes want to refer to something old, but you nearly always want to check out something new. Petabytes of video are uploaded to Youtube every year. Probably TB/day wouldn't be an overestimation for audio on Spotify. All data is being updated constantly.
Also, the above is valid for pretty aggressive data compression. Is aggressively compressed data what we want? No. 2h video compressed into about 500 MB was totally fine 15 years ago. If I download a 2h movie today, it's normally around 20 GB. And by no means it's uncompressed.
Seriously, by now you should know for a fact, that if one believes there's such thing as "too much storage space" — he's stuck in the 80s.
And even if there would be such thing — realistically, a cluster of nodes in Google's datacenters can find you a book or a video you are looking for way faster than the most perfect HDD you could theoretically have locally. So, again, normal people wouldn't want to have all this stuff even if they could.
So I was just talking smack, but I think it might be possible.
Remember when we could do 64Kb/s reliably over the web? Then you could do 'voice'. And after that threshold was crossed, you could basically do unlimited voice very quickly.
So Wikipedia - text only - is ballpark 50G - which is to say it would fit on a single mobile phone.
That is bigger than the first Google index!
I don't know how old you are, but the notion that you could walk around with this massive database, literally the size of all of Google right in their pockets in 1999 - would have blown people's minds. It was basically unthinkable.
The rate of growth of storage has slowed down a little in the last decade but there are still jumps to be had, and it's not inconceivable that we get 100-500TB storage in the nex1 10-15 years in regular devices, meaning 10-100x that in a slightly large storage device.
While video data is expanding (4K is much bigger than original HD) it can't go on forever, meaning, just like voice and text, once we cross a certain threshold, then it becomes irrelevantly small relative to storage as well.
So I think there's some value in my point:
In 10-20 years from now, as video storage becomes 'trivial' just as text is today (aka all of wiki text on your phone) - then huge amounts of data become available, instantly.
Though some data sources change a lot - others do not.
It's not inconceivable that we put the 'entire western canon' in everyone's homes.
The other thing no so evident in my comment is that there's only so much use for all of this data.
We are getting massively smaller marginal returns for all this 'big data' we store, frankly, I question in many cases if it's worth it at all. I think a lot of companies have been duped into saving every mouse click or whatever concerning every customer. The world is just not that complicated.
What this means - is as computers miniturize - and storage as well - we may see regular data centers shift away from the cloud back to 'on premise'.
If you can fit 'infinite computing power' in a little closet, and the parts are easily replaced ... then I can see companies doing that.
The promise of cloud computing today largely rests on the economies of scale of physicality: parts take up space, cables, power, heating/cooling, you want a lot of flex/headspace, configurations.
I don't see why in 20 years from no, you can buy 'off the shelf' a 'box' that has the equivalent of 100Ec2s, 500TB of storage and multiple G/s networking cards.
You could run an entire corporate office of 1000 people from just a single box.
The 'physicality' of it all would be mundane and irrelevant. Obviously, it would be 'super complex' and still need 200 IT people to admin all of the software, but physically it could be small and cheap.
'Big maybe' of course, but there are some possibilities in there I think.
The entire libgen archive is roughly 50 terabytes (most of the books ever written). It will be a very long time before we reach that level of storage in a phone.
What's more ... although it would be ugly and awkward, one can easily imagine that 50 micro sd cards could easily be mounted on a single USB like device that sits in your pocket and attaches to a phone. The reason it doesn't exist is primarily nobody needs it rather than "it can't be done".
I share your childlike optimism. But another thing (that neither of the sibling comments mention) is write speed. Having a ton of capacity to store massive amounts of data at rest is not the same thing as being able to easily make near-instantaneous (or even slow) copies. Getting all that data on there is going to be a problem, and so it would only be economical for data sources in high demand, not unlike the way that optical media get the pits and lands stamped in at the factory based on an expensive master. "All of YouTube" might be (probably is) in high enough demand, but it would also require cooperation from the gatekeeper of that content, who at this time has adverse incentives because it's making (many times) more money selling ad impressions for basically every view that it's able to. Even aside from that, what other large datasets are in the same demand category such that the work could or would be subsidized in that way?
So far, storage is still following something resembling Moores law, but it will probably hit physical limits way before a year of youtube (probably 30 of these new tapes or so) fits into your hand.
That's good. But when will it ever move out of the enterprise market?
What I really want is an affordable Tape storage for us common folks / home users to archive data long term. It's really a pain to keep transferring your old backups every 2-5 years to a new CD / DVD / portable HDD. There is a market for home users that really don't want to put their data on the "cloud". And this generation is really creating so much data, some of which, I am sure they would like to store long-term.
So having worked in the tape library space, they don't really make sense for home users.
They say that they have a 50 year lifespan, but everyone in the industry knows that's a blatant lie these days. The tape customers transfer everything over to the new generation every 2 to 3 years, so it's never tested and none of the customers really care.
You also only get a couple hundred writes to a tape in it's lifetime. That includes bulk appends as the stress from seeking is a major factor of what kills them. They really need to be treated as WORO media, bulk, the whole 12TB or what have you at once. Not a great fit for home use.
$100 each starts making way less sense when they have the same semantics as a burned optical discs for home users.
These days the main benefit to tape is the physical safety of having the media be inside a cartridge, and the streaming read speeds are stupid fast so you can reconstruct a backup faster.
Home use, even with a data hoarder sized NAS, is better served with a bluray burner with multiple copies.
> $100 each starts making way less sense when they have the same semantics as a burned optical discs for home users.
These tape-drives have 300MB/s+ of read/write speed. Sequential yes, but almost all backup tasks are sequential.
Optical has 10MB/s or so. You get more out of a 100Mbps connection to (insert cloud storage provider here), let alone Gbps.
At a minimum, a modern, reasonable mechanism for backups needs to be faster than cloud (100Mbps or Gbps), otherwise its basically worthless to the consumer. Hard drives and Tape get there, but there was no real way to improve Optical's read/write performance (outside of overengineered "jukebox" robots available to Facebook and a few other select groups), so it went fully obsolete.
That's 6x speed, or roughly 25MB/s. You can store 25GBs per disk and it is going to take 1000% longer than a $150 5TB Hard Drive AND you're going to have to sit there and remove / add a new disk every few minutes.
And for what? The hard drive is cheaper after all of that.
> Hard drives have a bad habit of not turning on after being off for a year or two is why.
So push the "ZFS Scrub" button every 6 months.
Don't store hard drives. Store a NAS. The entire freaking computer is stored as a unit. Every few months, turn it on, push "ZFS Scrub" to double-check the data, and you're set.
BluRays also degenerate over time unless stored in proper UV-sealed conditions / temperature controlled conditions. Everything requires a degree of maintenance and checking. The question is how to best automate that checking process.
Hard drives are read/written to at 200+ MB/s, making these maintenance checks much faster. They're also bigger, which means no need to manually insert / remove disks from drives. This entire process could be automated from a "Wake on LAN" packet and a few clicks from any terminal (phone, computer, whatever).
-------
I dunno. I'm looking up these BD-R XL drives you're talking about: its like $30 per 100GB disk or something. That's an obscene price for so little storage. I guess if all your data fits in one of those disks its fine, but... I got some archived movies and stuff on my NAS (50GB per Blu Ray). Needed to store some of my video editing files (so I need the original data that matches with my video editing files).
The real data of importance of mine probably fits in a BluRay. But then I won't have all the other stuff I've saved up "because I can" on my 2x 5TB NAS (Mirrored, so only 5TB of storage).
Ehh? Not really. All of these things we're talking about are components of our backup strategy.
If I wanted an offsite backup, its just rsync to some cloud provider (like rsync.net) or something. I don't do that because I don't think my data is worth the recurring cost of that, but its an option.
-------
My point is that when my solution is 200MB/s and on the order of $500 to $1000 for the component ($500 if you build your own NAS, $1000 if you buy premade parts. Assuming 2x 5TB hard drive is just $300 for 5TBs of _mirrored_ storage)...
While your proposed component costs $30 per 0.1TB and read/writes at a lousy 70MB/s... to get the equivalent mirrored setup you need to buy 100 Blu-Rays or roughly $3000 in Blu Ray XLs alone (2 copies of your data across 2-different Blu Rays, for the same redundancy factor as the 2x Hard Drive solution).
So I have to raise my eyebrows a little bit. How are you checking that the data doesn't degrade? Are you manually checking all of those BluRays you've created for reliability? That's a lot of sitting around and inserting / removing drives.
Its literally cheaper to build a 2nd NAS and stick more hard drives into it, and keep that 2nd NAS offsite somewhere.
-------
If you're gonna hold my feet to the fire over this "2-media" thing, then my 2nd media of choice would be Flash storage before Optical. Because SPEED is king. Speed means you can checksum your data and ensure that your backups are still good. I think Flash is a bit expensive compared to HDD, but based off of these BL-XL disks, I'm thinking that Flash is actually cheaper than Blu Ray and something like 10,000% faster. (Tapes would be a more ideal 2nd media... but I'm not "big enough" to make tapes cost-effective).
Yes, checksums and scrubbing. If you want to protect against bitrot (in ANY medium), you MUST double-check your backups over time.
TEST your backups. Any backup strategy that doesn't have a regular testing schedule is null-and-void in my opinion.
I seem to think that HDD, then Flash, and MAYBE Tape (if you're going really, really big) are the mediums of choice of the modern computer user. I'm not really seeing where Optical fits in today's world. Maybe a future Disk format (with maybe TB-level disks) can make Optical a thing again... but 100GB Blu Rays aren't really in the works for the modern user.
70MB/s is slow. And 100GB is too small per storage unit.
If you're doing archival storage, does the write speed really matter? If you have enough volume of incoming data where the write speed does matter, you've already eliminated most archival storage methods anyways.
True. But slow transfer speed is still a negative when we are comparing it to other archival methods (in this case, the tape drives). And that is why the demand for tape drives for consumers.
I'm having a very difficult time understanding why they are a negative at all for most use cases. It's not like running a backup monopolizes the machine you're running it on.
Again, tapes are for archival storage. Not nearline, not live, archival. As in, you write the data and likely don't come back to it for months to years.
I built such a cloud service; you can archive the data uploaded on our cloud storage to tape on demand, and we'll send you the tape back when requested (you basically buy the tapes once, and they're yours, no string attached). It failed to gain any traction, though.
First off, mad props for building out the service. That was an ambitious idea and I’m sorry it didn’t take off. Second, if you ever write a blog or article about it someday please mention it on hacker news.
Sounds like a good idea. But I am guessing it failed as it wouldn't be attractive for common folks because, while the tapes might be affordable, tape drives are really expensive and uncommon.
In fact the service is tentatively targeted to content producers. Most post-houses do have LTO drives (it's a requirement to work with NetFlix among other to archive to LTOs). The idea is that any content owner can bring their tapes to the post house when needed (where they would actually use it) instead of USB disk drives.
Having given some thoughts, your potential business success depends on convincing the clients to not buy a tape drive, and addressing the above concerns. Smart pricing can address the former, but the latter needs more brainstorming.
Some of the issues you need to address:
- Trust. How can someone trust you with potentially proprietary / copyrighted data?
- Speed. Is it faster to copy the data to a tape locally, then to transfer to you?
- Reliability. How can the client be sure that you have made a successful backup?
- Advertising. Did you reach your target base to make them aware of your product and address all their concern to make them consider it?
Yes, that's the point. When you're a content creator and you're making at most a couple of TB of data per month (typical figures), hardly enough to fill a handful of tapes a year, it's hard to justify the expense for a tape drive. OTOH cloud storage is either too complex (S3, Glacier) or too expensive (Dropbox, Box) for this volume of data.
Trust: yep, hard to say for this one. At least I can provide actual guarantees (for instance, none of our storage is out of the country).
Speed: it's possible to send us tapes directly. However the main selling point of our solution is to allow people to upload relatively small volumes of data continuously, then archive it all to tape in large batches when you have enough to fill an LTO.
Reliability: the interface allows the user to see where their files are (on which tape), to run checksum controls on tapes, and restore from tape. Every tape is sold with 3 free operations a year (archive, checksum verification, restoration).
Advertising: that's clearly not our strong point :)
Totally! I have a friend who does sound engineering for hollywood and he has two firesafes full of HDDs, SSDs and TAPE with hundreds and hundreds of terabytes of data. I couldn't even finish the sentence "Have you tried the cl..." before he cut me off and was like: way too much data. Didnt have the heart to tell him two of those three media types need to be powered on periodically...
HDDs can't require powering all that periodically. I just found my hard drive from college, and plugging it into my computer just worked. Great finding all my old documents and music that I'd forgotten about.
It doesn't. What matters is that you have a filesystem that can detect and repair bitrot. For that to work, it needs to check everything occasionally, which means they need to be powered up occasionally.
If you don't do that, eventually you'll get to a point where it can't repair anything, and then you gain nothing over using a filesystem that doesn't do this, is the point.
A 12 TB HDD requires 10 GB/day for three years to be filled. This is not home market, it's professional market or hoarding (by today's standards).
Objections on the failure rate of HDDs are absolutely valid, but then one should also need to consider the bigger picture (e.g. storage loss), in which case, having a remote copy is also important.
> A 12 TB HDD requires 10 GB/day for three years to be filled. [...] hoarding (by today's standards).
You have an interesting argument - by the same reasoning, when one has eventually filled up a 12 TB HDD, it would no longer be hoarding by tomorrow's standard. In other words, at this point, one should be able to get the next generation of HDDs for cheap, and it's this fact that makes all tape drives unnecessary.
Now I wonder whether it was a mistake for me to buy a spare 14 TB HDD to 1:1 mirror my new file server for cold backup. Perhaps a smaller one should be good for 5 years anyway...
> A 12 TB HDD requires 10 GB/day for three years to be filled. This is not home market (...)
You're basing your argument on extreme and unrepresentative assumptions.
First of all, you do not need to fill a storage device to the brim to justify it's use. You can use the exact same rationale to claim no one needs 500GB HDs even though they are pretty much standard these days.
Additionally, you falsely assume that your data storage needs start the very moment someone buys a drive, and up till then they have no data lying around. That's not the case at all. People buy high-density storage devices because they already have the data lying around, and they don't want to lose it. You're ignoring that people already have piles of CDRs/DVDs/Blu-ray disks lying around.
Additionally, you are somehow assuming that people would buy storage devices for their density, ignoring their primary use case: long term data preservation.
> A 12 TB HDD requires 10 GB/day for three years to be filled
I cheerfully quantify just about everything in my life, yet somehow I missed that one. I am a little bit of a data hoarder, or maybe just a little paranoid. Seeing those numbers is actually quite helpful. Thank you.
PS I get downvoted a lot for comments like this, so in case it sounds sarcastic or facetious, it is not. I mean it sincerely.
Indeed. The home directory on my PC at home has 80GB of files stored. Most of it is just there out of laziness. About the only thing I try to keep backups of are my tax returns.
What do people have on a home machine that needs enterprise-grade backups?
You might say photos and videos. I guess that's a personal thing; I realized a long time ago that I never spent any time actually looking at any of the pictures I took, so I stopped taking them.
> I realized a long time ago that I never spent any time actually looking at any of the pictures I took, so I stopped taking them.
That is a completely logical take. However, I sort of think of it as a posterity move. Although my kids might not particularly want to see pictures of themselves, their great grand children—or beyond—might love it. I would pay a pretty penny to see home movies of my grandparents, whom I never knew.
100% agree. I have a NAS for photos and media, plus a big USB HDD for ZFS snapshots of my machines - but all of this is homelab tinkering, not an actual backup strategy.
I realized in the last year or so that the only digital media I have which I would be genuinely sad to lose were wedding photos - so I saved those to 3 different cloud providers, made a Blu-ray copy, and a copy on the NAS. If I lose all of that in some tragedy, chances are I've got bigger things to worry about.
Photos, videos, music, but also current laptop backups, servers/VMs backups, current and past phones backups and HDD images of past computers hard drives (to be done), digitized family videos and photos (also to be done).
It adds up quite fast. And being able to have put everything on a tape every year with a label on it would give me some peace of mind, even more if I place 2 of them on different location.
Well, nowadays digital documents, such as videos and photos, are much more important and widespread and mundane than they were a couple of decades ago.
During the 80s there was no digital consumer camera market. Nowadays every person can easily generate hundreds of megabytes of photos and videos per day. Each hour of a 4K video can be close to 7GB, and we're already seeing cellphones which are able to record 4K video at 60fps and 1080p videos at 240fps.
Just barely. It was possible to buy a PC with 2 or 4MB of RAM in the late 1980s, but you couldn't do much with it. On Windows/DOS, I think applications had to be written specifically with "extended memory" or "expanded memory" in mind. There were 2 incompatible "standards" for how to organize memory beyond 1MB.
On the Mac side, in 1989 you could apparently buy a Mac IIci for $6200 with 1MB or 4MB, "expandable to 128MB".
That would be wonderful, unfortunately the storage density of tape is dependent also on the magnetic particle in the tape; smaller particles which can be magnetised enable smaller heads, which in turn enable higher storage density.
I know your comment was a throwaway idea but I got a little bit too involved looking up information around this answer - so I've committed further and looked a little deeper to avoid looking too stupid:
Currently with BaFe (Barium Ferrite) as their magnetic particle, LTO-7+ tapes have particle sizes less than 100nm in size (https://indico.cern.ch/event/637013/contributions/2669089/ - check out slides 14 and 15 in the powerpoint.).
VHS on the other hand is a little more tricky to find information on I suspect due to the age of the technologies involved it's going to be significantly bigger - but the closest I've found to a technical document is a student paper at NYU which refers to this IEEE Paper (https://ieeexplore.ieee.org/document/50474) saying that the average particle size for VHS is ~300nm and for S-VHS at ~150nm.
But that first presentation also mentions smoothness increases over time with LTO, which suggests improvements in the coating process.
I suspect improvements in the coating, combined with both the size of magnetic particle shrinking, enabled an increase in density of particles on the tape. This in conjunction with improved thinness of the tapes enabled large increases in tape length for the same cartridge size (nearly double the length from LTO-1 -> LTO-8), has in turn led to the enormous jumps in capacity we've seen in LTO.
Meaning unfortunately I don't think it'll be possible to do super interesting things with VHS.
I store on an external HD, and every year I buy a new HD and copy across to it and verify checksums against my records. Costs basically zero and low-effort. What am I missing?
The OP is mixing mediums with different characteristics. The DVD has a fixed size (say, 5.x GB), while hard disks are relatively open-ended. One can buy 10+ TB magnetic disks for a cheap price (less than 200$).
If the OP really needs dozens of TBs of capacity every few year, they definitely don't fit in the home user market they are talking about.
Not trying to be rude or sarcastic here: what is that command? Certainly in my experience on both macOS and windows, the default operating system GUI mass copy invariably poops out halfway through, with no explanation as to why. Maybe something like this?
You have to save your important data in multiple mediums to keep it safe. This is in case one medium fails. Obviously, cost being a factor, optical disks is one of the mediums to be considered despite all its other downsides (slow transfer speed).
Actually nearly every enterprise backup system my company has deployed in the last 5 years is tapeless. Most systems involve snapshot backup management that are replicated across multiple disc/SSD systems. (I'm the network guy, so not my focus area)
Tape is moving from just 'enterprise' to cloud provider enterprise. The periodic cost of having a tape storage/management system installed is a hard pill to swallow. Most companies still use tape but they just access it via AWS Glacier.
Yeah what does a new HD cost? Like £100 max for a big high quality one? Compared to buying a £5k tape drive and expensive tapes... yeah that's basically zero.
$100 is definitely not "basically zero". Not only that, but a $5k tape drive and tapes are designed to last a significantly longer period of time, essentially bringing the cost of long-term storage closer to $0/year than your option of spending $100/year.
> a $5k tape drive and tapes are designed to last a significantly longer period of time
You're going to use the drive for 50 years? 50 years? Half a century? And you think it'll be working and supporting a format with a useful capacity in half a century?
That's assuming a consumer tape drive will cost that much. They'll succeed only if they cost less than $500. Enterprisey stuffs are always costly. Just like consumer grade HDDs vs entrprise HDDs.
It's 50 years before $100/year catches up with a one-time cost of $5k. On top of that, the total capacity of the mediumgoes up steadily over time, while you will need to drop another $5k every 10 years or so or span your backups across multiple tapes.
The checksum isn't the relevant bit. That doesn't change whether you use a hard drive, a tape, or a DVD. Forget about it as part of the discussion if you want.
ZFS does that, except with error correction codes applied against bitrot. All you gotta do is turn on a ZFS-system and hit the "scrub" command. That automatically checks all checksums: and error-corrects all data that fails a checksum.
If anything, ZFS makes that whole system you just described easier and more automatic. Its not really hard for me to push the "scrub" button on my Nas4Free box every few months.
The post you were replying to specifically didn't want to use "the cloud." But for single-digit TB, which covers consumers who don't either have a huge number of video files or is a serious data hoarder, the answer is the use local disks. And,if they are OK with cloud storage which they probably should be, something like Backblaze is probably a better choice.
The solution is generally to not be doing full duplicative backups but to check hashes between already backed up files and new files, only backing up what you need.
I don't recommend backing up to HDD's. They are prone to early failure from portable use because of vibration and drops.
What we need are more affordable sata SSD's. Currently nvme SSD's are very close to the same price.
If you could sell a 1TB portable usb 3/usb c backup device that comes with good software the vast majority of people would be set.
> I don't recommend backing up to HDD's. They are prone to early failure from portable use because of vibration and drops.
Your backup device should stay at home, on the desk or in the closet where you hide your networking equipment. There's very little benefit to trying to carry around your backup device with your laptop on a regular basis, and if you do, you still need another backup device that isn't going to get stolen at the same time as your laptop.
SSD and flash storage can have their own issues. For long term unpowered storage there might be issues with data retention. Not so much with magnetic media. And with bad drivers there is risk of ruining parts of disk by writing too many times...
This. Even 100MB/s SSD will do. We need a type of NAND and SSD that offers High Capacity and low speed and low cost. The Current 2TB is ~$170, compared to a 2TB Portable HDD at ~$60.
And I am wondering, if Silent File Corruption, bitrot etc are things of the past on SSD? Does BTRFS / ZFS even make sense on SSD?
Current LTO-9 tapes can store 18TB and the LTO roadmap doubles capacity about every 3 years. So this tape tech would be on the same scale as we might expect LTO-14 to offer in 2035.
So unless this is an incredibly radical breakthrough, that’s the timeframe I’d expect for the headline to become a real product.
Note that the article shows a table from IBM claiming they achieved 35TB-per-cartridge capacity in 2010, and that still isn’t something you can buy.
IMHO, partly for marketing bluster, but partly because hardware compression in the tape drive is a useful feature if you don’t want to handle 1000MB/sec of compression workload on the host during backups.
>To put 580 terabytes in perspective, it’s roughly the equivalent of 120,000 DVDs or 786,977 CDs — IBM notes that stacking that many CDs would result in a tower 3,097 feet (944m) tall, or taller than Burj Kalifa, the world’s tallest building.
Last I looked into this, if you had a lot of data to back up (Say 1 Gbyte per second, continuous, with a retention time of 1 year), it was still far cheaper to simply use hard drives. One employee can keep up with all drive replacements, hardware setup, etc with time to spare. Drives aren't super power hungry, so any old office building is suitable. Encrypt the drive contents on another site so you don't need 24/7 security. Total system cost was sub $1M with a running cost of $500k/year and storage of 50PB. Bargain.
Now that GDPR applies, most companies need to rewrite backups every 30 days anyway to remove data where a GDPR deletion request applies. That tips the scale further in the direction of always spinning hard drives. Just hook up 64 drives to each machine, make sure you only do streaming writes of 1Gb+ files, do some ZFS raid-like scheme, and away you go.
You don't have to do it this way, you can just use encryption at rest with a different key for each user and throw the key when a user ask for deletion of their data. No need to get all the backup back to scrub them one by one.
Our lawyers didn't think that sufficient to cover ourselves. Upon finding out it wasn't going to cost many millions to simply scrub the actual data rather than the keys, they came back that it was money well spent to delete the actual data.
It was mostly because a user might share data with another user, for example two users at the same postal address. Our fraud team needs to be able to look stuff like that up, so there need to be database keys on stuff like that, both in the backups and in production. If one of the users with a specific address has a GDPR deletion, we need to delete that users data, but if another user has the same address, we still need to keep the address itself. Yet if both users have a GDPR deletion apply to them, as well as deleting the address, we need to delete the fact both deleted users had the same address (even if we don't know the address, because the pattern of which deleted users shared info with other deleted users could identify them)
See... it's complex! Simplest solution properly delete the data and rewrite the backups!
IANAL but you do not need to rewrite backups to brute force compliance. You need to inform your customers what your retention policy is though. I've seen large enterprises communicating backup lifetime up to 6 months after a deletion request.
How long would the tape from an audio (compact) cassette need to be, to hold 580TB?
How thick would that tape need to be to allow the motorized spool to pull the tape from the other (passive) spool?
Imagine you were to make an assembly that could hold two spools of this tape, how large would it be? It might be hard to fit it through the door of your server room. And it might not even fit in the van that carries away your off-site backups :)
Yeah! Somehow my brain started to channel Shakespeare and say "Why, so can I, and so can any man, be the tape but long enough!"
(Then I had to look it up. Henry IV Part I, Act III, Glendower: I can call spirits from the vasty deep; Hotspur: Why so can I, and so can any man, but will they come when you do call them?)
That's a great effort in technology and it is remarkable. I have some doubt about the real benefits that are beyond the technology though.
First of all about the security of this system. Yes, it is very avantgarde but being a physical support I will always have the worry that it can be damaged.
If I immagine to use it to store sensitive information this worry rise up. It can easily be accessed or stolen or damaged.
And what about the possible cost of a solution like this?
a part from this aspect it is incredible what Fujifilm has been able to reach in terms of tech
580 Terabytes is really impressive. Although, I do not know why I am still waiting 5D crystal tech (maybe durability), and hopping it will be market available some time soon For now Microsoft has taken over the project and status is quite uncertain ...
Question to those who are familiar with latest tape backup performance: does it make sense for me to take tape backups of my data (e.g. using a second hand tape drive)? Or its a bad idea for home use?
I wonder about using these for science data archival, although these days I suppose that most projects can just provision archival from a commercial vendor.
> Data on the tape is stored at a record-breaking density of 317 gigabytes per square inch...
> When tape is being read it is streamed over the head at a speed of about 15 km/h and with our new servo technologies we are still able to position the tape head with an accuracy that is about 1.5 times the width of a DNA molecule.