Fujifilm Created a Magnetic Tape That Can Store 580 Terabytes

fernly · on Dec 29, 2020

Aw, c'mon chaps! Can we just admire the tech for a second?

> Data on the tape is stored at a record-breaking density of 317 gigabytes per square inch...

> When tape is being read it is streamed over the head at a speed of about 15 km/h and with our new servo technologies we are still able to position the tape head with an accuracy that is about 1.5 times the width of a DNA molecule.

VoxPelli · on Dec 29, 2020

Loveliest part was the part right before that:

“Just let me geek out for a second”

Absolutely IBM storage person, the more geeking out the better!

samstave · on Dec 29, 2020

[flagged]

Synaesthesia · on Dec 29, 2020

Data storage is already so cheap thanks to HDDs that a poor African country can afford to spy on its population. This will hardly change anything.

If we want to do something about it we have to revolt in the streets.

ddingus · on Dec 29, 2020

The tape may well be the better scenario.

With a pile of random access devices, the time and cost of a single record retrieve is low, though the cost of bulk records can get high.

All the oppressors need is a half decade window, maybe a decade to be pretty much golden, as far as totalitarianism goes.

The tape will definitely hold the data, and then some apparently.

But access costs go up.

The drives seem to encourage indiscriminate targeting.

Tapes would at least favor some priority.

The smarter bears will use both!

You are not wrong about what it will take.

cm2187 · on Dec 29, 2020

In a world where we hold people accountable for stupid things they did or said when they were teenagers, I can see a lot of blackmail value in retaining data for a much longer period of time.

ddingus · on Dec 29, 2020

It is all bad, agreed. Was just musing over kinds of evil more than anything else.

adrian_b · on Dec 29, 2020

HDDs are fine for short term storage, but they are too unreliable when you want to keep the data for many years, possibly for a lifetime.

Unfortunately, currently there is no other commercially available method of archival storage, except magnetic tapes. Optical storage has a too low density to be able to compete with magnetic tapes.

Dylan16807 · on Dec 29, 2020

That presumes you're putting the data in cold storage somewhere. For data that's being kept accessible, the reliability of a hard drive doesn't matter. It's transferred from RAID to RAID over time. And spy data is probably in warm storage.

samstave · on Dec 29, 2020

Nope.

Dylan16807 · on Dec 29, 2020

Help me out here; which part are you saying nope to?

samstave · on Dec 29, 2020

Archival...

Imagine saving BILLIONS of [data points](face,interactions,vids,text,etc) per person on hot or warm? NOPE

Life history will be on tape in the archives of such...

Dylan16807 · on Dec 30, 2020

Except for videos, that doesn't take up a lot of space. The oppressive part is tracking everywhere you go and everything you say, which fits easily into warm storage.

For example, storing your position every 20 seconds might take 10KB a day. You'll collect 15 million data points in a decade, but each one is only a few bytes.

samstave · on Dec 31, 2020

the problem I see, some entitity will dev an algo that will extrapolate between frames using corrolated data from even yet other users' info

so you subject-A at frame 0 then you have suject-A at frame 100

but no data from subject-A between - but you have data from subjects-b-------------j and can track subject-A in such....

extrap, and you have the "zoom and enhance" of surv....

samstave · on Dec 30, 2020

scale of the infra cost... (the lifetime of a single person cost NxBACKUPS tapes...)

Edit: but i like your discourse

anonytrary · on Dec 29, 2020

Can someone explain the Utah reference?

taneliv · on Dec 29, 2020

From Wikipedia[1]:

> The Utah Data Center (UDC), also known as the Intelligence Community Comprehensive National Cybersecurity Initiative Data Center, is a data storage facility for the United States Intelligence Community that is designed to store data estimated to be on the order of exabytes or larger. Its purpose is to support the Comprehensive National Cybersecurity Initiative (CNCI), though its precise mission is classified. The National Security Agency (NSA) leads operations at the facility as the executive agent for the Director of National Intelligence.

[1] https://en.wikipedia.org/wiki/Utah_Data_Center

ddingus · on Dec 29, 2020

Absolutely. I had no idea tape was being pushed to these levels!

Crazy good tech accomplishment right there.

In a way, I am somehow comforted. Tape remaining relevant kind of dropped off my radar. Good to see.

317Gb / square inch AND 15km/h transport speed? That is nuts fast.

Wonder how robust it really is? Of course at those densities and potential transfer rates there is ample room for error recovery.

paulmd · on Dec 29, 2020

It has never really entered the public consciousness since it's not a consumer-facing technology but tape development has continued apace. LTO-9 should be hitting the market very soon and supports up to 45TB per tape (compressed capacity, 18TB raw).

Not quite sure where IBM's numbers here come from, their previous numbers don't match up to the progression of the LTO tape series' capacity. Maybe they are citing "research numbers" that they can do in a lab but aren't production-ready yet. I would certainly assume they are citing "compressed" data figured there.

But certainly tape has continued to progress much faster than most people would have imagined. Big tape libraries are still a thing in certain environments and they work very well, there is no better solution for bulk cold storage.

Dylan16807 · on Dec 29, 2020

> LTO-9 should be hitting the market very soon and supports up to 45TB per tape (compressed capacity, 18TB raw).

LTO is cool and all but is the "compressed capacity" number really something to repeat with a straight face? The tape holds 18TB, we don't need to pretend it's anything else.

> But certainly tape has continued to progress much faster than most people would have imagined.

Mostly it has. But I'm somewhat worried about the future after the sudden late-game announcement that LTO-9 would have a 50% capacity improvement instead of the usual doubling.

jabl · on Dec 29, 2020

> LTO is cool and all but is the "compressed capacity" number really something to repeat with a straight face?

No, but it's been the standard in the tape industry for decades. Probably dates back to the first tape controllers that had builtin compression (so compression didn't tax the main cpu).

ACAVJW4H · on Dec 29, 2020

I am trying to find something what would be make compression done by the tape controller favorable. Maybe it somehow makes the recovery more fault tolerant in the long run? Because it knows about the intricacies of the medium. Just guessing, I know nothing about tape storage

ddingus · on Dec 29, 2020

My first LTO drive installation was on a multi user SGI CAD application server. This system did all the compute and data management for roughly 30 users. I/O streaming was easy and efficient.

IRIX allowed for live file system backups and the drive doing compression meant all that happening with negligible user performance impact.

Was literally set it and forget it, aside from tape rotation into off site storage.

Compression would have had an impact.

We don't do multi user app serving much today, so maybe a smart drive has less benefit. But it mattered then. 2000's era.

guenthert · on Dec 29, 2020

The announcement [1] is of lab results, so yes "research numbers". A commercial product might be years out (if ever developed). This is not meant to belittle the achievement (which is awesome), but to clarify what has been done and what to expect.

[1] https://www.fujifilm.com/news/n201216_01.html

ece · on Dec 31, 2020

Amazon Glacier is tape-based, so anyone certainly can get access to tape for backup purposes.

kazinator · on Dec 29, 2020

That transport speed has to just be for rewinding and fast-forwarding. If the terabyte you want is the 580th terabyte, you need a quick way to skip past terabytes 1 through 579.

The hardware is not going to read 300 Gb/in density at bicycle speeds. :)

daRealDodo · on Dec 29, 2020

> Data on the tape is stored at a record-breaking density of 317 gigabytes per square inch

Me wondering about those micro SD cards that can store 1Tb.

luma · on Dec 29, 2020

Maybe around an order of magnitude more. The big difference is that square inch of tape might be 3-5 orders of magnitude cheaper to manufacture.

reitzensteinm · on Dec 29, 2020

Those SD cards aren't close to a single layer, so it's not really equivalent.

Ad absurdum, stacking two Micro SD cards on top of one another hasn't just doubled your density even though surface area is unchanged.

Widdershin · on Dec 29, 2020

Fair enough but chances are the tapes will have a much longer lifetime for storing data, which is the primary use case for this sort of thing.

adrian_b · on Dec 29, 2020

The density of flash memory is competitive with magnetic tapes, but the retention time is too low, making flash memory completely unusable for archival storage, even if it would have been as cheap as magnetic tape.

In theory, write-once memory cards, using some kind of antifuses, could be designed to have a lifetime good enough for archival storage, but nobody has attempted to develop such a technology, because it is not clear if there would be a market for them.

Most people do not think far ahead in the future, so they do not care much about archival data storage, until it is too late and the information had already been lost.

Dylan16807 · on Dec 29, 2020

> The density of flash memory is competitive with magnetic tapes, but the retention time is too low, making flash memory completely unusable for archival storage, even if it would have been as cheap as magnetic tape.

I disagree that it's unusable. You'd end up with a puck the size of a data tape that can archive a petabyte of data and needs to be plugged in to a 5 watt power supply for long term storage. That's not super onerous. Then consider that tapes need to be stored at exactly room temperature with 20-50 percent humidity, while this puck would barely care about environment at all. And you could plug it directly into a computer without a $5k drive. Honestly it sounds pretty good to me. We just need to drop the price of flash by a factor of 20 to make the scenario happen.

VoxPelli · on Dec 29, 2020

It probably references the density within the data tape world, which is significant as there could be other ways to achieve a higher total storage, but this is one of the major components here it seems

ajnin · on Dec 29, 2020

> 15 km/h

The tape stores 580TB for a length of 1255m, does that mean the read speed is 580TB/1255m*15km/h = 1.9TB/s ? Seems too high.

bastawhiz · on Dec 29, 2020

Tapes are often structured in bands, and those bands are divided into wraps, and those wraps are divided into tracks. There are many tracks on a tape, and they snake back and forth from end to end on the wrap (so you don't need to "rewind" when you get to the end of the tape—you just start reading back in the other direction). In newer tape drives, you physically can't read all of the data at once because the tape head is only a fraction of the width of a single band: it physically moves (laterally) to position itself over the right data.

Dylan16807 · on Dec 29, 2020

Since I had to look up more info to understand this explanation, I'll try to give my own, using the numbers for LTO-8.

The drive has 32 heads, and reads/writes 32 tracks at a time. It goes from the start of the tape to the end, then aligns with the next 32 tracks and reverses direction.

Each group of 32 tracks is called a wrap. There are 208 wraps, so 6656 total data tracks. Even wraps go one direction, and odd wraps go the other direction.

That's the important part.

But also the tape is divided into 4 "bands", each one holding a quarter of the wraps/tracks. Between the bands, and at the edges of the tape, are special servo tracks that are used for alignment.

So when a source talks about "wraps per band", it's pointless abstraction. Unless you're really in the weeds, the only thing you want to know is the total number of wraps.

jeremyjh · on Dec 29, 2020

I'm guessing that is the fastest it can traverse the tape, it didn't say it could read at that speed.

LeifCarrotson · on Dec 29, 2020

> When tape is being read it is streamed over the head at a speed of about 15 km/h...

It did say that. I wonder if there's some multi-track/double-layer/double-sided stuff going on that needs multiple passes to read, that does sound awful fast!

texse · on Dec 29, 2020

Most recent tape standards have multiple "bands" and "wraps" placed in parallel. The head reads only one wrap at a time, so it takes many passes to read the whole tape. For example an LTO-8 tape has 52 wraps each within 4 bands, requiring 208 passes to read completely.

TedDoesntTalk · on Dec 29, 2020

Why not just have multiple heads then?

Dylan16807 · on Dec 29, 2020

There are. The tradeoff of speed vs. complexity is currently sitting at 32 heads.

paulmd · on Dec 29, 2020

It would be like downloading a file using GetRight back in the day, where you'd have one thread downloading the 0-25% chunk, one downloading the 25-50% chunk, and so on.

You could hypothetically do that, for sure, but the software basically is derived from the tape era where you'd just have one logical stream coming out of the tape.

nine_k · on Dec 29, 2020

Positioning the tape should become harder, I suppose.

Also, I suppose they limit the read bandwidth with something like what a single Infiniband connection would support; few disk arrays support much higher speeds.

paulmd · on Dec 29, 2020

Yeah I've been wondering about the numbers too.

The numbers they cite for "previous generations" don't match up to the progression of the LTO tape series' capacity. Maybe they are citing "research numbers" that they can do in a lab but aren't production-ready yet. I would certainly assume they are citing "compressed" data figured there.

Also bear in mind that tapes typically store data striped across the tape in multiple tracks and multiple bands. There are four bands per tape and 12-52 wraps per band, so reading the whole tape requires up to 208 passes across the tape.

https://en.wikipedia.org/wiki/Linear_Tape-Open#Physical_stru...

But yes, to agree with another parallel comment, tape data rates are quite high sequentially (abysmal in random of course, but that's not how they're used). LTO-8 does 750 megabytes per second compressed / 360 megabytes per second raw.

jeffbee · on Dec 29, 2020

That is too high but tape really does have surprisingly high transfer rates. You need a real computer to keep them streaming.

nine_k · on Dec 29, 2020

Wow, it's like a segment of a floppy disk of infinite radius!

thamer · on Dec 29, 2020

> Data on the tape is stored at a record-breaking density of 317 gigabytes per square inch...

The table from IBM[1] in the middle of the page says this number is in gigabits (not gigabytes) per square inch. This is obviously impressive nonetheless, but I wonder what else this article got wrong.

[1] https://petapixel.com/assets/uploads/2020/12/ibmtale.jpg

kazinator · on Dec 29, 2020

> record-breaking density of 317 gigabytes per square inch...

Yawn on the raw density figure, though.

The chip inside a 128 gigabyte micro SD card is a small fraction of a square inch and is cunning enough to provide random access.

You just can't easily and cheaply have a long tape of them.

It used to be the case once upon a time that mass storage media like magnetic tapes and discs based on writing on surfaces with a head had better density than memory chips.

0df8dkdf · on Dec 29, 2020

317 gb per square inch

That seem perfectly reasonable. You can get 256 USB flash in in size about half square inch, and that is random access data. (e.g: https://www.walmart.com/ip/SAMSUNG-256GB-Bar-Plus-Titan-Gray... )

bluepoint · on Dec 29, 2020

That is roughly 500 bits per square micrometer or 2000 square nanometers per bit or an average (assuming square lattice) distance of 50 nanometers per bit.

dmix · on Dec 29, 2020

I inherently want to know how this works but I suck at physics and I’m just going to simply also stare in aw from the outside.

NiceWayToDoIT · on Dec 29, 2020

With that kind of precision I would like to know what kind of system they use to reduce outside mechanical vibrations?

guenthert · on Dec 29, 2020

It isn't mentioned and I don't expect much other then inertia of the device itself. There is a servo mechanism assuring the head follows the tracks on tape. Improvements on that are mentioned. I don't think it matters whether the tape (necessarily) or the environment vibrates or whether that distinction is meaningful.

analog31 · on Dec 29, 2020

Time for the ever golden quote: Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

https://en.wikiquote.org/wiki/Andrew_S._Tanenbaum

diveanon · on Dec 29, 2020

When I was working in vfx we employed a guy who's sole job was to fly back and forth from LA to Melbourne with pelican cases full of hard drives.

It was faster and more secure since we were working on air gapped networks.

raihansaputra · on Dec 29, 2020

That feels like a dream job for a temporary amount of time. And very effective too. I assume the VFX studio is in Melbourne?

bronco21016 · on Dec 29, 2020

Maybe if they’re flying first class? Otherwise that sounds like a nightmare that would require an insanely high paycheck.

Raidion · on Dec 29, 2020

If you're flying back and forth on that route that often, you're going to get upgraded quickly, I would imagine. I've done 6 flights a month domestic and have been upgraded to first class for free fairly often.

bronco21016 · on Dec 29, 2020

Depends on the rewards program I suppose. A US airline I’m familiar with does not upgrade frequent fliers on international flights.

disillusioned · on Dec 29, 2020

You're allowed to say their name here.

gruez · on Dec 29, 2020

>Otherwise that sounds like a nightmare that would require an insanely high paycheck.

If the courier in question is 6'2" that might be the case, but I suspect if the person was much smaller and is a heavy sleeper, that it wouldn't be too bad.

diveanon · on Dec 29, 2020

He was a writer, and seemed to enjoy the job

bronco21016 · on Dec 29, 2020

As a person of 6’0” I found the flight from LAX to SYD, and return, miserable in coach. I slept a lot and watched some movies but any way you cut it that is a long time to be sitting in a very uncomfortable seat.

lostlogin · on Dec 29, 2020

In the before times I flew Auckland to Doha over 20ish hours. I can’t believe anyone likes flying, every bit of it is awful.

I’m 1.97m or so and it’s really not designed for me. I suspect it’s optimised for those at 1.6-1.7m.

0x10c0fe11ce · on Dec 29, 2020

1.75m here and coach sucks for me too. For that reason and social distancing I just bought a car that can drive me to places I’d usually fly, in much more comfort.

nine_k · on Dec 29, 2020

It's an ideal job if all you need is hours of concentration, reading, and writing or drawing. A writer, a PhD student with a laptop, a comic book author, etc.

Or, maybe, even a Zen monk who spends time meditating.

jmnicolas · on Dec 29, 2020

Dream job to spend hours and hours in a plane? Then more hours clearing customs? Thanks but no thanks.

kilroy123 · on Dec 29, 2020

But Americans going to Australia just pass through a computerized system. You don't even have to talk to anyone.

I agree with OP, if this was a first class flight, it wouldn't be a bad gig.

diveanon · on Dec 29, 2020

Yeah one location in LA and another in Melbourne so that we had 24 hour coverage.

dmix · on Dec 29, 2020

[flagged]

diveanon · on Dec 29, 2020

Frequent fliers are still a thing.

Your post is ill informed FUD.

dmix · on Dec 29, 2020

Still sounds like an awful job to me flying that much.

Not sure why people said it was a good thing to do. You'd be bored of the process within a couple months and hating planes and airport... unless you're a certain type of person.

A lot of traveling sales guys have talked about this.

867-5309 · on Dec 29, 2020

I wonder if tapes are more prone to some form of driveby attack, whereby instead of requiring physical or remote access to a location, a strong enough magnetic field within a certain distance of a datacentre could penetrate bricks and mortar and render them useless

I'm envisioning a huge device in the back of a van which pulses a powerful beam, similar to typical movies (Oceans 11?) cutting the power to a bank / casino prior to a raid

t-writescode · on Dec 29, 2020

Are you familiar with the story of the force field generated at the 3M plant apparently in 1980?

It would probably take more power than that to do what is being suggested from the distances that are being suggested.

I'm not sure it's feasible.

rkagerer · on Dec 30, 2020

https://www.reddit.com/r/ElectroBOOM/comments/9jig1l/can_you...

harikb · on Dec 29, 2020

Hopefully this was cabin baggage and not something left unattended for minutes on the baggage belt

kortilla · on Dec 29, 2020

Unattended for 14+ hours you mean. A baggage handler on the departing side could be bribed to steal it and you would have half a day of unfettered access before the courier on the plane even knew it was missing.

Of course you don’t check it if it’s that critical.

diveanon · on Dec 29, 2020

Transport security wasn't as big a concern as consistent delivery.

Network volitility is a huge problem when transferring huge files.

closeparen · on Dec 29, 2020

Is the internet in Australia really less reliable than airline on-time performance?

diveanon · on Dec 29, 2020

Time is one factor, it's also surprisingly easy to corrupt vfx master files with just a few missing bits.

Checksums will identify something went wrong, and then you need to redownload the file to a quarantine network and scan it. Takes time.

Much easier to go from trusted source to trusted source and verify the files on drive prior to shipping.

Amazon just released a device (forget the name) for exactly the same use case. We developed it in house.

Not to mention most major studios will contractually prevent you from exposing anything to the web.

closeparen · on Dec 29, 2020

Apologies for all the questions, I'm just curious about this.

>Checksums will identify something went wrong, and then you need to redownload the file to a quarantine network and scan it. Takes time.

Surely a sensible file transfer algorithm would compute checksums on small and easy-to-retransmit chunks? Does rsync not do this? Isn't it already happening in TCP?

>Not to mention most major studios will contractually prevent you from exposing anything to the web.

I understand that workstations with media on them are not going to have internet access, but do they really prohibit site to site VPNs?

>Amazon just released a device (forget the name) for exactly the same use case. We developed it in house.

Snowball and Snowmobile? IIRC these are primarily meant for one-time migration from on-prem storage to S3. Do people really use them on an ongoing basis?

itismetheidiot · on Dec 29, 2020

FYR Amazon has the AWS Snow Family with the largest being a 45ft TEU, while Microsoft does Azure Data Boxes that are somewhat smaller.

dannyw · on Dec 29, 2020

Or you encrypt it.

analog31 · on Dec 29, 2020

If it could be checked, then it could have been shipped. I would assume hand carry.

unwoundmouse · on Dec 29, 2020

Doing some guestimation, it seems like you could fit an exabyte into an AWS snowmobile. That's insane

meonkeys · on Dec 30, 2020

Why guess? https://aws.amazon.com/snowmobile/ says 100PB.

sedatk · on Dec 29, 2020

I'm more interested in the lifetime of the medium. Durable backup media for consumers is still a holy grail as I understand. M-DISC didn't hold up to their 1000 year lifetime promise. Archival-grade DVD's are also not good enough as I understand. Syylex went bankrupt. I want a consumer-grade backup medium that can provide at least a 100 years of lifetime.

That said, I was able to recover 90% of the voice recordings my father made between 1959-1963 on reel-to-reel tapes 60 years later. Tape can be very durable but what I recovered was analog voice very tolerant of errors. I'm not so sure about gigabytes snuck into an inch-square.

noizejoy · on Dec 29, 2020

Brings back memories of a great remix contest in 2006 where the digital copies of analog tape tracks from Peter Gabriel’s “Shock the Monkey” were made available to remixers.

To my surprise the pitch of the samples was a little lower (and varying ever so slightly over the duration of the song) than what you’d have expected with a440 tuning. It baffled me, since I expected some of the early digital synths used in the original sessions should habe been rock solid 440 tuning.

And that’s how I learned about “tape stretch” where analog audio tape stretches just enough to make the pitch of everything a few cents lower over long period of time.

p.s. I ended up applying digital pitch correction, so I could “jam along” with my own synths :-)

sedatk · on Dec 29, 2020

A different problem happened to me when digitizing my dad's tapes. Dad bought the player in USA and brought it to Turkey and made the recordings there. When I digitized them in USA, everything sounded in higher pitch. It later turned out that the voltage difference in AC (60hz/50hz) caused the rotation speed to change proportionally, so I slowed it down to 5/6, and it was perfect afterwards.

dannyw · on Dec 29, 2020

Ask HN: How can I create the most reliable and durable NAS today? I have a lot of very sentimental, very-important files, such as family photos and videos. And I simply like to hoard data.

I currently have 8TB of data stored on a Synology DS218+ with RAID1, and monthly data scrubs (verifying checksums). It is backed up remotely to Google Drive (in encrypted form), and I also maintain an infrequently-updated, once-per-quarter disk clone with an external HDD.

My biggest concern with my current setup is that the memory is non-ECC. Even though the files are checksummed, I am concerned that specific memory corruption / bit-flips could propagnate into the checksums, and hence result in data corruption.

I am considering:

* Building my own FreeNAS box using AMD Ryzen (which semi-officially support ECC memory). My concerns here are the semi-official nature of support: how do I know if ECC works, before a rare cosmic bit-flip?

* Purchasing a Synology DS1621+. This is AUD$1400 which is a tough pill to swallow, for the equivalent of a first-gen Ryzen quad core and 4GB of memory.

Any options or recommendations is appreciated!!

GauntletWizard · on Dec 29, 2020

You’ll know ECC works when it matters; When you encounter a flipped bit. With an 8TB RAID, that is likely to be within the next 24 months.

Go with option 1, and Raid-Z2+. With RaidZ2, you’ll be able to not only detect but correct a flipped bit - Even if that flip happens when writing out the data.

Pay attention to the counters. Your ZFS scrubs will report how many resilvers they have. You’re likely to encounter 1 in a scrub. You’re unlikely to encounter more than 2. If you see that, that’s when you check for memory errors. A single bad sector is likely the hard drive. Even a single flipped bit is likely a transient error; It could be your memory, or your disk, or anything in between. It happens at scale, and 8TB, read repeatedly, is a lot of bits.

Look into rasdaemon and memtest86 - They’re the tools you use to debug what happens when

The other advice I can give you: Don’t be paranoid. Your photos are likely to acquire bit rot. You will have dozens, even hundreds of bit flips that will happen in your lifetime. Of the many thousands of photos you will take, the chances that you will ever notice the discoloration or bad line that a bit flip will have in a photo are pretty small. Bit rot happens. Your photos are important to you, and you should treasure them, but treasure them for what they are: Things that you protect and are under your care, not things that must be twenty nines of correct. You can realistically achieve 10, even 12 nines of correct reads on your data. You don’t need more.

amelius · on Dec 29, 2020

But what if a bit flips in a header or in the OS's file length field?

GauntletWizard · on Dec 29, 2020

ZFS checksums metadata, too.

stiray · on Dec 29, 2020

Regarding ECC and ZFS: https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-y...

"Next, you read a copy of the same block – this copy might be a redundant copy, or it might be reconstructed from parity, depending on your topology. The redundant copy is easy to visualize – you literally stored another copy of the block on another disk. Now, if your evil RAM leaves this block alone, ZFS will see that the second copy matches its checksum, and so it will overwrite the first block with the same data it had originally – no data was lost here, just a few wasted disk cycles. OK. But what if your evil RAM flips a bit in the second copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted. A later scrub will attempt to read all copies of that block and validate them just as though the error had never happened, and if this time either copy passes, the error will be cleared and the block will be marked valid again (with any copies that don’t pass validation being overwritten from the one that did)."

(dont just read the quote, read the link)

I am using ZFS since 2007/2008 and I have never had any issues (except with those damn Seagate 3tb DeathStar hdds, where I was barely able to replace them fast enough - in 3 months 3 were gone - i will never buy seagate again)

Co-creator of ZFS: https://arstechnica.com/civis/viewtopic.php?f=2&t=1235679&p=...

I am having a asus microatx board with 16gb of non ecc ram, additional SAS HBA and 3x3Tb Toshiba into zraid and additional 10TB HGST He3 disks + Ultrium 3000 (LTO-5, they are quite cheap today and are designed for 24/7 operation which is barely what I will ever encounter while tapes can be restored on latest LTOs if needed, while you can get cartridges on some sale for peanuts) for backups. There is no way in hell to go for my important data (like images) to disk only and tape is nice. You take the cartridge and store it at parents/gf/worspace drawer/...

Anyway if I remembered correctly, google has lost 150k of user accounts in 201x and restored them from tapes. So even for cloud opinionated people, it still makes sense to shovel important data to tapes if you dont use the data in everyday processing (and just an info - even shelved disks die)

Tepix · on Dec 29, 2020

Write the really important data for long-term storage to MDISC BD XL media (100GB each). If it's pictures you care about, i'm sure it's far less than 8TB that need the VIP treatment

sedatk · on Dec 30, 2020

See my other replies about reliability (or lack thereof) of M-DISC.

wojciii · on Dec 29, 2020

I just built my Truenas box using Ryzen. Let me know if you want a part list for inspiration or any tips.

dannyw · on Dec 29, 2020

I would love your part list! Did you go with ECC memory? Did you have any way of verifying the ECC is working and actually detecting/correcting bitflips?

The other thing I am interested in is minimizing idle power consumption. Just to be more environmentally friendly.

wojciii · on Dec 29, 2020

I don't know if you want to build the same kind of system but at least you can get a list of parts that work together.

I use my Truenas box for storage using ZFS, VMs and NFS server for different PCs.

I bought ECC memory as I understand this is more or less a requirement for ZFS.

I found out that FreeBSD which Truenas is based on can give you info about what type of RAM is present.

The command is: # dmidecode -t memory

According to this I have ECC RAM. :)

I did a build with the following parts:

Case: SST-CS380 V2 (space for 8 3.5" drives, 2 x ). Mainboard: ASRock X470D4U2-2T Power supply: Seasonic Focus Plus 550W Gold 80 Plus Full Modular Power Supply RAM: Kingston Server Premier, DDR4, 16 GB, KSM26ED8/5 M CPU: AMD Ryzen 5 3600X Wraith Spire CPU Cooler: Arctic Liquid Freezer II NVMe to PCI bridge: ASUS Hyper M.2 x16 Gen 4 (PCIe 4.0/3.0) supports 4X NVMe M.2 devices (2242/2260/2280/22110) up to 256 Gbps for AMD TRX40 / X570 PCIe 4.0 NVMe RAID and Intel® RAID platform - CPU features

I bought all this from Amazon.de. The main board is a server main board with 10 GE Ethernet and a console for flashing bios and remoting into the machine - no graphics card needed. This was expensive and you can most likely save a lot using a consumer grade main board.

The RAM was taken from the list provided at https://www.asrockrack.com/general/productdetail.asp?Model=X...

The CPU models supported are here: https://www.asrockrack.com/general/productdetail.asp?Model=X...

Be careful not to buy an AMD APU - this doesn't support ECC RAM for some insane reason. An APU would have build in graphics and the CPU.

I use both SATA drives (long term storage) and SDD drives (for speed).

I created two ZFS volumes (I don't remember the proper terms). One for rotating discs (which could sleep most of the time) and one using SDDs for fast storage which doesn't use much power.

I have 6 KW solar cells with battery so I don't really care if the box uses a lot of power. During daylight its more or less free when sun is shining. I get next to nothing when selling the electricity that I generate and would like to use as much as possible locally.

julienb_sea · on Dec 29, 2020

There really isn't much market for it. You can pay Google or Apple or one of the large cloud providers a very reasonable and decreasing rate for a literal guarantee that your data is accessible. The only risk is the company goes under, which is extraordinarily unlikely for someone like Google / Apple and the shutdown would have advance notice.

I realize for the hacker news audience there are multitudes of reasons the solution above doesn't fit your needs, but realize the consumer market is near nonexistent.

eulers_secret · on Dec 29, 2020

Wouldn’t the other risk be that after you’re dead and stop paying the data is deleted or inaccessible? Maybe if you could prepay 100 years it’d work?

bastawhiz · on Dec 29, 2020

If nobody is "inheriting" your data (or rather—nobody cares enough to keep your data around), it seems kind of moot to ensure it hangs around. That is, if I put stuff in an S3 bucket and pre-pay for 100 years, if nobody is around to download it in 100 years then why bother?

If you wanted to make a sort of digital time capsule and didn't care who discovered it, your next best bet would probably be the Internet Archive or some other archival community.

If your data isn't appropriate for archival (i.e., can't be publicly consumed) and isn't interesting enough for your friends/family/etc. to keep around on your behalf, keeping the data is purposeless.

sedatk · on Dec 29, 2020

I absolutely take inheritance into account when having backups lasting for a hundred years, but regardless of how uninteresting my data looks, we don't know if today's boring data would be invaluable for science in the future. We show slippers from 5000 years ago in museums today and they're invaluable. Consider the person who owned it, walking on a national treasure, unaware. Maybe, they didn't even like the slippers, found them boring. :)

elcritch · on Dec 29, 2020

I was thinking that DNA is a pretty robust storage medium. Perhaps we could use it in coming years to store data for long term survival.

Though considering these comments and the advent of mRNA/CRISPR, perhaps we could store data for future generations in our own DNA. That'd be fascinating if you could read journals or even audio/video of your ancestors from your biological inheritance from them. What if we could engineer an extra chromosome to do just that, then let them remix and recombine segments of memories so everyone's would be unique.

adrianN · on Dec 29, 2020

Just store your diaries in a line of yeast that produces tasty beer or wine. That could work. I wonder what the oldest yeast lines in use today are, and how stable their genomes are.

elcritch · on Dec 29, 2020

Or if you really want your data to survive, engineer it into a virus for your local species of cockroaches! Getting the data back could be gross, but it'll survive nuclear holocaust. ;)

bastawhiz · on Dec 29, 2020

The importance of those slippers is tied strongly to their rarity. So little survived from 5000 years ago that almost anything from that time is valuable.

By comparison, we'll create more data in the next ten minutes than entire centuries from our relatively recent history. Lots of stuff is getting preserved in lots of places with substantive redundancy for virtually nothing. Your slippers today are likely to be more valuable than the near-infinite troves of documents and photos and whatever else.

drieddust · on Dec 29, 2020

I agree consumer market doesn't exist because everyone is seduced by the cloud nowadays. Having said that supposed guarantees provided by these companies should not translate into blind and complete trust.

A single bad bug or security issue can make data inaccessible or corrupt. Tapes on the other hand does not have that issue. IMHO trusting all your important data to a single vendor or technology is a recipe for disaster.

elorant · on Dec 29, 2020

The lifetime of the medium is half the equation. The biggest problem imho is the lifetime of the device. Say I give you an old floppy drive for 8" floppies from the seventies. Where would you connect it?

sedatk · on Dec 29, 2020

You're right but not all media are the same. Arctic Vault used an optical film with QR codes. You can theoretically even take its photo and decode it by hand if you want. They even added a Rosetta Stone to the entrance so, even if all the knowledge is lost, one can hypothetically decode the data stored there. For magnetic media, you need some more specialized equipment for sure.

userbinator · on Dec 29, 2020

Floppies are an interesting case because the protocols and physical specifications are all documented publicly, which means that one could literally build a drive from scratch today --- the trickiest part being the heads, but considering that they are many orders of magnitude lower density than HDDs, it would not be a big obstacle in the future.

(I believe 8" floppy drives have the same interface as 5.25" ones --- and there's no shortage of adapters from the retrocomputing community for those, some of them even open-source.)

Tape is far more closed, AFAIK most of the common formats are proprietary and the specs are behind NDAs and other walls.

crististm · on Dec 29, 2020

I have hobby books from 30 years ago that teach you how to build magnetic heads for cassette tapes. No pictures or anything but with enough patience you definitely can build one today, at home (even then). Mind you, the size of the gap is not that big of the problem if you put your mind at polishing.

kmeisthax · on Dec 29, 2020

Tapes are uniquely terrible at this. I'd argue it's the Achilles' heel of the medium, even moreso than the actual limitations of linear tape.

First off, the drives themselves are going to be expensive brand-new. You're going to be paying thousands of dollars for a drive, and it's probably going to have some weird interconnect that you'll have to spend even more money and waste a PCIe slot on an adapter in order to use the drive. Most common is SAS or Fibre Channel; although there's at least one company selling drives in Thunderbolt enclosures for the Mac market.

(Aside from all that, SAS is actually pretty cool for things like hard drive enclosures, since it has SATA compatibility. I have a jury-rigged disk shelf built out of an old HP SAS expander, a slightly-modified CD duplicator case, and some 5 1/4" hard drive bays.)

Second, tape formats are constantly moving. LTO and IBM 3592 come out with a new format every 2-3 years and backwards compatibility is limited. Generally speaking you can only write one generation behind on LTO and read two generations back. So, if you want a format that's got drives still being made for it, you'll need to migrate every 5-7 years. Sure, the actual tape is shelf-stable for longer, but you're going to be buying spare drives or jumping on eBay if you want to keep old tapes around that long.

(eBay is actually not a bad place to buy used tape drives, but the pricing varies wildly. It's perfect for hobbyists and small-fry IT outfits looking for cheap backup media. Absolutely terrible if you're a large outfit with reliability guarantees and support agreements to maintain.)

Third, actually using a tape drive is a nightmare. First off, Windows hasn't shipped tape software since 2003 (I think?), so you'll be in the market for proprietary backup solutions. Second, if you're just writing data directly to the tape, you will shoe-shine for days. Common IT practice is to actually have a second disk array sitting in front of the tapes as a write cache and custom software to copy data to the tapes at full speed once all the slow nonsequential IO is done. Reading from tape doesn't have to worry about this as much, but the fact that you had to use custom software just to archive your files means that you now have proprietary archive formats to deal with. So you can have tapes that rely on both having access to working drives and licenses for proprietary backup utilities.

(Of course, if you had decently fast SSDs and a parallel archival utility, you could sustain decent write speeds on tape. I actually wrote this myself as an experiment: https://github.com/kmeisthax/rapidtar and it can saturate the LTO-5 drive I tested this with.)

fy20 · on Dec 29, 2020

That's probably not going to happen. Tapes have high longevities and you can buy used LTO drives on eBay for a few hundred bucks, but the biggest issue in 100 years is going to be finding a device to read the tape, and finding an adapter to hook it to the USB-H 12 quantum optical port on your motherboard.

A better way would be to use something like that to store it for a decade or two, then copy the data onto whatever the newer version of archive medium is (LTO drives can typically read the last one or two versions as well). Rinse and repeat every decade, and it also lets you test if there has been any bitrot.

nichch · on Dec 29, 2020

Is there a source regarding M-DISC being unable to hold up to their lifetime promise?

A quick google search brings up amateur-ish blog posts, and even those co sign the medium.

sedatk · on Dec 29, 2020

Yes, French national metrology and testing laboratory LNE: https://www.lne.fr/sites/default/files/inline-files/syylex-g...

labawi · on Dec 29, 2020

> The objective of this study was to investigate the behavior of the GlassMasterDisc of Syylex under extreme climatic conditions (90°C and 85% relative humidity) and to demonstrate the potential of this technology for digital archiving.

> The result of this study is that the GlassMasterDisc has a much longer lifetime in accelerated aging than other available DVD±R

I wouldn't draw any other conclusions on normal ageing of other tested media. They did an accelerated aging test at 90°C and 85% RH, where most discs didn't last a single test cycle (of 10 days), two discs lasted a single cycle, and only syylex lasted all 4 cycles.

Quote on a brand-name DVD

> This DVD model had the longest lifetime (i.e. 1500h) at 80°C and 85% RH. At 90°C, it is destroyed after the first cycle of 250 hours.

For an idea of what it does to the substrate:

> [for measurement] DVDs have to be taken [out] .. To prevent the formation of water droplets in the polycarbonate, it is necessary to "purify" the polycarbonate from the water that was absorbed at high temperature.

OTOH, I had CDs (Verbatim, upper middle class), of which about 1-2 of 50 had issues after 20 years storage (in dark, mostly room-temperature conditions).

sedatk · on Dec 30, 2020

Yes, they did an accelerated test, and M-DISC has performed as good as archival grade DVDs. Syylex, which promises the same lifetime as M-DISC, performed significantly better. That clearly shows either M-DISC didn't live up to their promise, or Syylex and archival grade DVDs surpassed expectations. Either is bad news for M-DISC, isn't it? What am I missing?

labawi · on Dec 30, 2020

The "accelarated test" may not be in any way indicative of true lifetime in moderate conditions. Their own conclusion does not draw any such implications, the only other test they reference is done at 80°C (10°C lower), and the only writing on how or why this test could be indicative of archival lifetime was a generic two sentence: harsher conditions -> faster degradation (in part 4).

It was a pupose-built test to see how much of X would Syylex take. It took X better than others, none of which took X well. Tests like these are very good, if you want to go with Syylex, to make sure it's not worse in some way (X, or Y, Z), which would then suggest a need for further examination. In real aging, factor X may be completely meaningless, while Y and Z are crucial, so you cannot conclude which one will last more.

Why test 90°C and 85% RH, not 80°C, 50°C or 110°C, or bending, UV light, scratching, drop in acid .. whatever? For a proper accelerated lifetime test, you would need to identify (all) relevant degradation modes and model their behaviour (and interaction) in target vs. accelarated test conditions, and then extrapolate behaviour in target conditions. They didn't even write what type of degradation they are testing.

Tepix · on Dec 31, 2020

I'm not convinced. I'm no expert on aging simulation however. But heating a DVD to 90°C seems like it would do different things to the disc than normal aging at recommended temperatures, wouldn't it?

nichch · on Dec 29, 2020

Thank you!

cm2187 · on Dec 29, 2020

Given the pace at which storage capacity increases, what is the rationale for not copying over your data every 5-10 years onto the next cheapest consumer mass storage of the moment? You get all your data in one place and you don’t have to deal with standards disappearing (I read 2020’s game consoles can’t read CDs anymore, people should rip their CD collection right now).

sedatk · on Dec 29, 2020

The bookkeeping it requires for one, since you don't usually buy all your backup media at once, you acquire them over time, that gets unnecessarily complicated. It's riskier to copy the media periodically as you might increase the chances of data corruption due to the fault in the copying process (faulty RAM, faulty software, not concentrating good enough etc). You periodically introduce possibility of user/hardware/software errors to the longevity of your backups.

Also, when others inherit the media they may not have proper equipment or skill to do it themselves as goal of the preservation is to get it 100 years ahead, not keep it always in a usable state per se. For example, I'd like my children to keep my backups until my grandchildren could access them 60 years later.

cm2187 · on Dec 29, 2020

For data corruption and mis-manipulation, I would be more concerned about the long term decay of any media than some bit flipping in RAM (even for tapes as their endurance relies on certain storage conditions, but it is likely to be a hard drive, writable DVD/Bluray or something flash based, these do not particularly age well).

For book keeping, I think my point is that storage media are becoming so big that you always consolidate into a single device every time you carry the data over (you may still want to duplicate for reliability). Like you can buy a 18TB hard drive today. A consumer isn't going to require more than one or perhaps two of those for anything to be preserved long term. And in 5y-10y, you will likely have 25-30TB hard drives.

The equipment problem is precisely what this addresses. You are always using the latest hardware, and the previous hardware you are using is still supported if you stick to a 5-10y cycle. For instance you would have moved away from IDE drives while you could still find motherboards with both IDE and SATA ports. But if your data is stored on an IDE drive, good luck connecting it on a computer in 2030 (if we haven't moved full Apple's "you can't customize your hardware and we deprecate everything very frequently").

Skills (and I would say mostly dedication) is still a problem. But we are talking about copy-pasting files between two media, it's not rocket science even if you don't script it.

Max_aaa · on Dec 29, 2020

The time needed to actually do the copying.

drdavid · on Dec 29, 2020

It seems like there may be a market there.

Every 10 to 15 years, you send in your archived (and new/interim) personal data and get it back on the current top-tech storage medium. That way it's not stored in the cloud and you can keep moving the stored data forward without having to deal with it all yourself.

cm2187 · on Dec 29, 2020

Not really. You can buy a 18TB hard drive now. Even if your data is humongous and needs several of those, it will likely fit on a single drive in 5-10y. So it takes an increasingly smaller amount of time to replicate (excluding the copying time which keeps the machine busy but not you).

phonon · on Dec 29, 2020

Use https://en.wikipedia.org/wiki/Parchive and add as much redundancy as you like. A lot cheaper to over-provision, than to create the uber-archive medium.

sedatk · on Dec 29, 2020

That's a good idea regardless of the medium, but if your flash disk becomes completely unreadable in 15 years, PAR doesn't help you much, does it?

phonon · on Dec 29, 2020

Then don't use flash (or at least not MLC).

Use an M-Disc with 3x PAR redundancy.

Or RAID 1 enterprise magnetic HD with 3x PAR.

Or all of the above :-)

Are you trying to archive TBs or MBs? Likely the next most important consideration is a stable place for storage.

giancarlostoro · on Dec 29, 2020

Hell, tell me it will last 20 years and I'm okay with that, if you can guarantee at least 15 years I can then buy replacements every decade and transfer files over...

sedatk · on Dec 29, 2020

You're right. That reminded me that I really don't know the state of my cold backups.

TedDoesntTalk · on Dec 29, 2020

> M-DISC didn't hold up

Since I have a lot of stuff on M-DISC, you’ve now got me worried. Link? Any other info? I don’t need 1000 years but what is the lifespan?

sedatk · on Dec 29, 2020

Here it is, it basically performed as the same as archival grade DVDs: https://www.lne.fr/sites/default/files/inline-files/syylex-g...

jaclaz · on Dec 29, 2020

Yep, but the reality is that "we don't know", an accelerated, simulated aging test may (or may not) have the same results as "real" aging.

250 hours @ 90 C°/85% humidity, how does that compare to - say - 100,000 hours at a "normal" 10-40 C°/30-50% humidity?

sedatk · on Dec 30, 2020

That's right, but regardless of how "extreme" testing is, archival grade DVDs have performed as good as M-DISC, and Syylex has surpassed the rest by a huge margin. Syylex promised the same lifetime as M-DISC (unlike archival grade DVDs). I think the results are good enough to see M-DISC either doesn't live up to the expectations or archival grade DVDs exceed the expectations. Either way, bad news for M-DISC. If Syylex hadn't bankrupted, it would have been the best option of course.

jaclaz · on Dec 30, 2020

Yes, what I mean is that - set aside the "glass" disc from Syylex - we don't know if both M-DISKs and archival grade DVDs suck or excel, let alone how long they actually last (readable) in the real world.

IF my last guess in comparison is correct, 100,000 hours at "normal" temperature/humidity is roughly 11 years, but it may well be that without "cooking" them at 90 C°, the duration is for both 200,000 hours (or whatever) ...

Single point anecdata, I had to dig i

dudus · on Dec 29, 2020

piql is the company that did the github arctic vault. And they offer it as a service. Maybe not for general public but still .

https://www.piql.com/arctic-world-archive/

sedatk · on Dec 29, 2020

Arctic vault is VERY expensive, about $100 per gigabyte.

ashton314 · on Dec 28, 2020

I remember watching the pilot episode of Star Trek (the original series) and chuckling when Spock reports that “the tapes are badly damaged” from the capsule they recovered near the barrier at the edge of the galaxy.

Turns out it might not be that outdated after all.

ikiris · on Dec 29, 2020

In fairness, you can open any box of tapes and assume the media will be badly damaged XD

LASR · on Dec 29, 2020

About a month ago found a stack of MiniDV tapes from about 15 years ago.

It was my own home videos. I wanted to preserve them, so wanted to upload them to Google Photos.

It took me some looking around in eBay to find a camcorder to play these back. When it came home, I realized that I needed a FireWire 400 port to capture full resolution. So more digging around for a FireWire PCI-E card. I was finally able to transcode and upload some 15GB worth of video. It took 3 minutes on my gigabit internet for the upload. About a month for the whole process acquiring the hardware etc.

When I think about these tapes some 50 years from now, it might as well be completely unrecoverable, not because the tapes went bad, but because we have nothing to read them. Makes you wonder about galactic time scales like in Star Trek.

trebligdivad · on Dec 29, 2020

That happens already; there were problems getting hold of https://en.wikipedia.org/wiki/Lunar_Orbiter_Image_Recovery_P... data only 20-30 years after the missions

peteradio · on Dec 29, 2020

And in 2 years those tapes might be the only thing you got.

jeffhuys · on Dec 29, 2020

Could you elaborate?

imoverclocked · on Dec 29, 2020

Maybe that he uploaded them to an unpaid service which might make a policy shift to remove the data at some unknown time in the future?

sciurus · on Dec 29, 2020

https://blog.google/products/photos/storage-policy-update/

throwaway201103 · on Dec 29, 2020

> found a stack of MiniDV tapes from about 15 years ago ... I wanted to preserve them

Here's where I raise my hand and ask "Why?"

They've been sitting in a box for 15 years. You never watched them, you never even thought about them. Why preserve them?

It's why I stopped taking pictures and videos of things. I never watched them again. It's all just a lot of waste motion over some dream that someday we'll find value in sitting and looking at this old stuff again.

xahrepap · on Dec 29, 2020

For me it’s personal history. My kids LOVE watching videos of me and my wife when we were younger. Luckily for them, my wife’s family converted all their videos to digital and so it’s easy to watch.

I’m itching to do the same to my parents’ collection so my kids can see more.

I suspect their kids will enjoy the same.

cm2187 · on Dec 29, 2020

Some things only get interesting with time. How young we looked. How different the city looks now. Or people who disappeared.

Causality1 · on Dec 29, 2020

Just because you don't doesn't mean other people don't. May as well condemn someone for following a sport you don't like.

throwaway201103 · on Dec 29, 2020

For me the biggest thing was that the tapes had been sitting for 15 years, then discovered and treated as though they were something valuable. If they were so important, why were they forgotten on a shelf for 15 years?

Wait until you have to clean out a parent's house with a rooms full of stuff saved because they thought the grandkids might find it interesting. Here's the reality: they don't. My mother saved boxes and boxes and boxes of photos. None of them ever sorted, or put in albums. Just thrown in boxes. Saved for decades. Was it hard to throw them away? Yes, a little. There were a lot of moments of my childhood there. But if I asked myself honestly, was I going to do anything with them other than put them on a shelf at my house? The answer was no. I'd advise sparing your kids that burden.

A friend of my mother's had saved every canceled check she ever wrote. Boxes of them, because she thought her kids or grandkids might be interested in them some day. Their reaction was likely: What's a check?

If you really cherish something, and it regularly brings happiness to your life, by all means save it. But do it for yourself, not for what you think your descendants will find interesting. And if it's been in a closet for 5 years, ask yourself why you are keeping it.

claytongulick · on Dec 29, 2020

Asking "why" doesn't imply condemnation, does it?

At HN we assume best intention, asking why is how we learn about other people and their reasoning.

Of course it may be different than ours. But isn't that how we learn and are exposed to new ideas?

IMHO we should encourage more respectful "why?" questions.

kortilla · on Dec 29, 2020

It has nothing to do with intent. It’s tone deaf to ask why someone would want to preserve memories, especially if it’s based on your one anecdote of yourself not wanting them.

It’s like asking why someone would want to spread ashes or preserve an old piece of rickety furniture a long lost relative built.

mega_dingus · on Dec 29, 2020

Reread the post, especially this sentence, and ask yourself if it was "respectful"

> It's all just a lot of waste motion over some dream that someday we'll find value in sitting and looking at this old stuff again.

claytongulick · on Dec 29, 2020

Yes, but when you read it in context, it's clear that the author is referring to themself:

>It's why I stopped taking pictures and videos of things. I never watched them again. It's all just a lot of waste motion over some dream that someday we'll find value in sitting and looking at this old stuff again.

The author uses "I" over and over again, explaining that they don't see the point, so asking about a different perspective seems genuine to me, or at least there's a plausible interpretation that it is genuine.

From HN guidelines:

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

dijit · on Dec 28, 2020

Great, so when can I buy one and how much will the tape drive cost?

The largest issue with tape is that the drives themselves cost absurd amounts, and you better not cheap out because failure is both time consuming and scary.

Swapping tapes continues to be human intensive and restore times long. But the tapes themselves are so cheap that at this scale it becomes worthwhile again.

vondur · on Dec 28, 2020

This is geared for enterprise settings, not home use. I believe someone mentioned these drives costing $25,000. I do agree reasonably priced tape drives with TB of space for home users would be great.

weare138 · on Dec 28, 2020

The prices are much cheaper now. You can get a single tape drive for under $300 and enterprise grade single tape drives for under $3k.

https://www.newegg.com/p/2BM-000A-000M5

https://www.provantage.com/quantum-tc-l72an-br~7QUAT0JW.htm

ardy42 · on Dec 28, 2020

> The prices are much cheaper now. You can get a single tape drive for under $300 and enterprise grade single tape drives for under $3k.

Those tapes only hold 200 GB, and you can get a 10 TB hard drive today for less than $300:

https://www.newegg.com/red-wd101efax-10tb/p/N82E16822234407

While tapes are theoretically cool, the the drives are just too rare for them to be of any practical value to a home user. Even if the media has a better archival life than a hard disk (say 50 years), it won't do you a damn bit of good if there are no drives available in 50 years to read it.

IMHO, hard drives are better for backup (even offline backup, just get a hot-swap bay and imagine the drive is tape [1]). Archival is a harder problem, but I've settled on using high-grade optical media, burned slow for fewer errors, making a bunch of redundant copies. Even though the media might not last as long, I'm pretty much guaranteed to be able to find a drive for the next several decades [2].

[1] "pretend this hard drive is a tape" is even an enterprise product: https://en.wikipedia.org/wiki/RDX_Technology

[2] For instance, you can still attach a 5.25 floppy drive to a modern computer: https://www.ebay.com/sch/i.html?_nkw=5.25+floppy+drive + http://www.deviceside.com/fc5025.html

klodolph · on Dec 29, 2020

> Those tapes only hold 200 GB, and you can get a 10 TB hard drive today for less than $300:

That was a link to an obsolete LTO-2 drive. If you’re using LTO-8, which is the current generation, you’d get a 10-pack of 12TB tapes for around $500. Noticeably cheaper per byte than hard drives.

I don’t recommend LTO-2 just because I don’t think that the drives are well-supported.

> While tapes are theoretically cool, the the drives are just too rare for them to be of any practical value to a home user. Even if the media has a better archival life than a hard disk (say 50 years), it won't do you a damn bit of good if there are no drives available in 50 years to read it.

If you’re evaluating just on the basis of the price of drives / media, then there’s a cutoff where tape becomes cheaper than hard drives. The easy way to calculate this cutoff is to divide the overhead cost (tape drive price) by the difference in the cost per TB of tapes and hard drives, which in the example here, is around $3k divided by $25/TB, or 120 TB.

In other words, if you have more than about 120 TB of data, then it is cheaper to buy a tape drive. I think any comments about whether tape drives are suitable for home use are really comments about whether you are interested in the use case of people who need to store >120 TB at home.

If you are running a YouTube channel as a hobby, or you run a side business doing videography for weddings, the tape drive starts to sound a lot better. The 120 TB cutoff might be around 300 hours of video, which might be only 50 events.

There are a lot of other reasons why you might NOT want to use tape, but it’s easy to have enough data that tape is the cheapest storage option. At “enterprise” scale the cost calculus is completely different and involves things like support contracts with Oracle (not necessary for hard drives), power/cooling in your DC (tape is very low-power), etc.

And let’s not forget that if you have 120TB of hard drives, you’re in the regime where you start having to buy multiple machines.

As for longetivity—I don’t have the data handy. If you are storing data on tape, you need to migrate to newer generations of tape, as old generations become obsolete and unavailable. If you are storing on hard disks, you also need to migrate because hard disks eventually fail (even if they are not powered on).

ardy42 · on Dec 29, 2020

> That was a link to an obsolete LTO-2 drive. If you’re using LTO-8, which is the current generation, you’d get a 10-pack of 12TB tapes for around $500. Noticeably cheaper per byte than hard drives.

But it looks like the cheapest drive on Newegg for that is $3,299.00: https://www.newegg.com/p/2BM-0046-00001

> In other words, if you have more than about 120 TB of data, then it is cheaper to buy a tape drive. I think any comments about whether tape drives are suitable for home use are really comments about whether you are interested in the use case of people who need to store >120 TB at home.

> If you are running a YouTube channel as a hobby, or you run a side business doing videography for weddings, the tape drive starts to sound a lot better. The 120 TB cutoff might be around 300 hours of video, which might be only 50 events.

No dispute there, but I think most >120 TB home use cases run into the question of if it's even worth it to keep the data (raw/uncompressed). For most people, the answer is probably no, and curation/compression makes far more sense. For instance, I don't think it's even typical for wedding photographers/videographers to store their final product indefinitely, let alone the raw footage for a re-edit. You can keep a lot more events in a lot less space if you're only keep the raw footage for a few recent events and the 15-30 minute final edit for a year after it's finalized.

> And let’s not forget that if you have 120TB of hard drives, you’re in the regime where you start having to buy multiple machines.

Not if you swap disks as offline storage.

> As for longetivity—I don’t have the data handy. If you are storing data on tape, you need to migrate to newer generations of tape, as old generations become obsolete and unavailable. If you are storing on hard disks, you also need to migrate because hard disks eventually fail (even if they are not powered on).

Honestly, the theoretical longevity is the only aspect of tape that appeals to me, but like you said enterprise tapes will end up being like these enterprise optical disks in a fairly short period of time, so you're regularly going to have migration legwork to do or your data's effectively toast.

https://twitter.com/foone/status/1236524992709316608

Karunamon · on Dec 29, 2020

You don't even need the absolute latest drives to get reasonable storage volumes. Some careful shopping can net you an LTO-5 (1.5TB native capacity) library unit (one that can be upgraded to newer generations) for under $500, with new tapes $10-15 each.

The nice thing about LTO as a format is its relative predictability and ease of acquiring the parts, even the very obsolete ones. It's all SCSI or SAS, most of the interesting stuff happens at the hardware level, with a bog-standard API. Your average backup app, whether it be Backup Exec or mtx/tar/etc. on Linux doesn't need to care about the media format. Unlike actual "enterprise" shops with datacenters and support contracts and such cruft, where the primary concern is "does it work", it is fine to buy older units second-hand. They are plentiful and cheap.

gruez · on Dec 29, 2020

> You don't even need the absolute latest drives to get reasonable storage volumes. Some careful shopping can net you an LTO-5 (1.5TB native capacity) library unit (one that can be upgraded to newer generations) for under $500, with new tapes $10-15 each.

That's still... pretty terrible. If we compare you costs for a tape drive against a 14TB easystore @ $190[1], the break-even point is around 72TB. I don't know about you, but that's of data you have to store for it to be worth it. Even at 150TB you're only looking at around ~25% (~$500) in savings compared to hard drive, which I don't think is much when you factor in how much of a hassle tape drives are to work with.

[1] https://slickdeals.net/f/14587381-14tb-wd-easystore-external...

[2] https://www.wolframalpha.com/input/?i=solve+y+%3D+500+%2B+%2...

Karunamon · on Dec 29, 2020

It's not a fair comparison to put up hard drives (integrated mechanics) against tape drives (seprated mechanics). They do not solve the same problem and have different longevity profiles. If I'm spending $500 on tape storage, it's because I want something that will last a long time, something that portable hard drives tend to have issues with.

LTO5 onward supports LTFS, which exposes the tape to the OS as if it were any other removable storage, with the one proviso that deleting files doesn't reclaim space unless the entire tape is wiped.

seg_lol · on Dec 29, 2020

LTFS is ok, a multilevel tar file with a file list followed by an FEC encoded block store would be easier and more portable.

It has been 8+ years since I played with LTFS, so maybe it has improved.

Karunamon · on Dec 31, 2020

LTFS as of today is supported in FUSE, which realistically means it works on every OS that matters.

blackrock · on Dec 29, 2020

True, on hard drives, the controller board will eventually fail. The solders might decompose or the circuitry might fail. Then you’re left with a platter full of randomized bits.

But on tape, the controller mechanism is separated from the storage medium. The controller mechanism is inside the drive readers itself, which will eventually fail.

So for both, it’s a trade off. They’re both going to eventually fail.

monocasa · on Dec 28, 2020

That $300 drive is an LTO-2 drive. It's so old that it's parallel SCSI.

I'd be surprised if you could even get tapes for it that aren't sketch grey market stuff that you wouldn't want to use for backups anyway.

segfaultbuserr · on Dec 28, 2020

One day I found a really cheap LTO-4 drive that costed $150. Interesting, but I found it was not practical despite the price and decided against it. First, an 800 GB LTO-4 tape is no longer high density by today's standard, it couldn't even hold a 1 TB HDD image, also, I still had to pay for the necessary SAS peripherals to get it working, finally, the mechanical assembly inside a 10-year-old tape drive was not something that inspires confidence for data backup... Last time I've looked up, the cheapest decommissioned LTO-5 drive still costs $1000.

prirun · on Dec 28, 2020

The first tape drive is a 200GB (native / uncompessed) internal SCSI drive. Not so easy for a consumer with a laptop to use.

The tapes are pretty cheap at $8 on Amazon if you buy 20.

weare138 · on Dec 29, 2020

Well they don't really make these things for laptops unfortunately but they also don't cost $25k+ like they used to. It's still more than the average consumer is willing to spend but they are affordable enough now to be a viable option for professionals and small businesses.

dijit · on Dec 28, 2020

What makes you think I meant home use?

throwaway2245 · on Dec 28, 2020

The first person pronoun you used ("I") strongly implied personal use.

jtxx · on Dec 28, 2020

No, it just means the person has the ability to purchase. They could be purchasing it for themselves or on the behalf of another organization.

throwaway2245 · on Dec 29, 2020

I accept that, in a literal sense.

I was attempting only to answer the question and explain the source of confusion.

In native English speech, it would usually be expected that a business-case purchasor would speak of a more general case, perhaps by using "we".

jtxx · on Dec 29, 2020

Ah - I took the question as saying “I did nothing to indicate the sort and could in fact be purchasing for a business”, so I was trying to explain that side!

Blikkentrekker · on Dec 29, 2020

In English, when purchasing on behalf of a corporation, “we” would almost always be used.

“we” is even frequently employed by auctors writing papers to depersonalize.

jtxx · on Dec 29, 2020

I guess it’s just me then. I wouldn’t write “we” in a comment online without explaining who “we” is, it seems like an unnecessary detail to explain

dijit · on Dec 29, 2020

That’s how I meant it. Seems odd to say “we” out of context like that. And anyway I would be the person purchasing so including the whole company is.. odd.

Anyway, I guess this is the pedantry we (!) should expect from HN

klodolph · on Dec 28, 2020

> Swapping tapes continues to be human intensive and restore times long.

Restore times are definitely long, but you can mostly avoid human labor by putting the tapes in a tape library, which will load tapes in the drives using robots. You still need technicians around because tapes / drives / robots will break, but individual restore operations can be completely automated.

chrisseaton · on Dec 28, 2020

Doesn't really sound like it's aimed at you. Tape is for serious long-term, high-volume storage. If you've got a limited budget for the tape device it's probably not designed for you in the first place.

throwaway2245 · on Dec 28, 2020

I understood the exact opposite that you have understood, and I find your comment quite uncomfortable, as a result?

The parent is clearly seeking to confirm serious long-term use, as you say, in considering the costs of failure cases.

chrisseaton · on Dec 28, 2020

> I understood the exact opposite that you have understood, and I find your comment quite uncomfortably presumptuous, as a result?

Is this a question? I can't tell you if that's how you find my comment or not, sorry.

They wants a cheap tape device because they just want to use it in the home. Tape devices aren't cheap... because they aren't aimed at use just in the home.

They aren't aiming at home users - I think that's a simple fact not presuming anything.

throwaway2245 · on Dec 28, 2020

The parent clarifies above that they are not a home user.

Even if you believed they were a home user, they clearly demonstrated in their original comment that they did know that it was for long-term, high volume storage, and they did know the cost, and you appeared still to belittle them for not knowing.

It's a question in the sense of - why did you choose to say it in this way, which seemed impolite? What did you intend your comment to add?

GordonS · on Dec 28, 2020

Is it possible to buy tape drives in the 2nd hand market for a reasonable price?

bombcar · on Dec 29, 2020

It is - the kicker is (and always has been) the media.

jariel · on Dec 28, 2020

We reached an inflection 'amazing!' point when we are able to put '100 songs in your pocket'. That was really a shocking thing given the limitations of CD/Tape etc..

But the real inflection point may come when 'all relevant information in your local disk'. The tapes here in question can maybe store every book every written!

We may be able to put the entirety of Wikipedia, every film, TV Show ever made, every book every lecture on a little disk.

The only thing we'd need to access in realtime would be contemporary data like traffic flows, weather situation etc..

The ability to store 'that much data quickly and easily' locally, may quite fundamentally change the equilibrium we have right now with the cloud and lead to a more natural decentralization.

"All of YouTube from 2008 until the present, 99 cents at the local gas station, on usb-like contraption"

krick · on Dec 28, 2020

This will never, ever happen. English Wikipedia (w/o pictures) copressed is measly 20 GB. It is hard to quantify "all books ever written", but I have kept copies of some online libraries large enough that they for sure have pretty much every book you can remember and 10 000 you never heard of for every single book you can remember. It's not that much, you can fit it on 1 or 2 regular HDDs.

Now, I did it because I'm that type of guy. There's not that many people who actually do this bullshit, even though it's perfectly doable.

So why don't they? Because it doesn't make much sense, if you aren't afraid of upcoming nuclear winter. Wikipedia is updated and improved every day. You only sometimes want to refer to something old, but you nearly always want to check out something new. Petabytes of video are uploaded to Youtube every year. Probably TB/day wouldn't be an overestimation for audio on Spotify. All data is being updated constantly.

Also, the above is valid for pretty aggressive data compression. Is aggressively compressed data what we want? No. 2h video compressed into about 500 MB was totally fine 15 years ago. If I download a 2h movie today, it's normally around 20 GB. And by no means it's uncompressed.

Seriously, by now you should know for a fact, that if one believes there's such thing as "too much storage space" — he's stuck in the 80s.

And even if there would be such thing — realistically, a cluster of nodes in Google's datacenters can find you a book or a video you are looking for way faster than the most perfect HDD you could theoretically have locally. So, again, normal people wouldn't want to have all this stuff even if they could.

jariel · on Dec 29, 2020

So I was just talking smack, but I think it might be possible.

Remember when we could do 64Kb/s reliably over the web? Then you could do 'voice'. And after that threshold was crossed, you could basically do unlimited voice very quickly.

So Wikipedia - text only - is ballpark 50G - which is to say it would fit on a single mobile phone.

That is bigger than the first Google index!

I don't know how old you are, but the notion that you could walk around with this massive database, literally the size of all of Google right in their pockets in 1999 - would have blown people's minds. It was basically unthinkable.

The rate of growth of storage has slowed down a little in the last decade but there are still jumps to be had, and it's not inconceivable that we get 100-500TB storage in the nex1 10-15 years in regular devices, meaning 10-100x that in a slightly large storage device.

While video data is expanding (4K is much bigger than original HD) it can't go on forever, meaning, just like voice and text, once we cross a certain threshold, then it becomes irrelevantly small relative to storage as well.

So I think there's some value in my point:

In 10-20 years from now, as video storage becomes 'trivial' just as text is today (aka all of wiki text on your phone) - then huge amounts of data become available, instantly.

Though some data sources change a lot - others do not.

It's not inconceivable that we put the 'entire western canon' in everyone's homes.

The other thing no so evident in my comment is that there's only so much use for all of this data.

We are getting massively smaller marginal returns for all this 'big data' we store, frankly, I question in many cases if it's worth it at all. I think a lot of companies have been duped into saving every mouse click or whatever concerning every customer. The world is just not that complicated.

What this means - is as computers miniturize - and storage as well - we may see regular data centers shift away from the cloud back to 'on premise'.

If you can fit 'infinite computing power' in a little closet, and the parts are easily replaced ... then I can see companies doing that.

The promise of cloud computing today largely rests on the economies of scale of physicality: parts take up space, cables, power, heating/cooling, you want a lot of flex/headspace, configurations.

I don't see why in 20 years from no, you can buy 'off the shelf' a 'box' that has the equivalent of 100Ec2s, 500TB of storage and multiple G/s networking cards.

You could run an entire corporate office of 1000 people from just a single box.

The 'physicality' of it all would be mundane and irrelevant. Obviously, it would be 'super complex' and still need 200 IT people to admin all of the software, but physically it could be small and cheap.

'Big maybe' of course, but there are some possibilities in there I think.

Invictus0 · on Dec 28, 2020

The entire libgen archive is roughly 50 terabytes (most of the books ever written). It will be a very long time before we reach that level of storage in a phone.

cameronh90 · on Dec 28, 2020

Only because we changed the goalposts. My 2007 iPod has more hard drive space than my current phone, but it was slow noisy disk storage.

Meanwhile my phone's storage is basically now just a high speed cache for the internet.

Whether it will go back the other way again in the future, I don't know if anyone can know but tech does tend to be cyclical.

krick · on Dec 28, 2020

To be fair, libgen has a lot of PDFs. Libraries in fb2 or such are orders of magntitude leaner.

foolmeonce · on Dec 28, 2020

I'm not sure what you mean by a very long time. We have 1tb microsdxc now and it is about 1000 times the capacity of sdcards available in 2005.

zmmmmm · on Dec 29, 2020

What's more ... although it would be ugly and awkward, one can easily imagine that 50 micro sd cards could easily be mounted on a single USB like device that sits in your pocket and attaches to a phone. The reason it doesn't exist is primarily nobody needs it rather than "it can't be done".

pwdisswordfish5 · on Dec 28, 2020

I share your childlike optimism. But another thing (that neither of the sibling comments mention) is write speed. Having a ton of capacity to store massive amounts of data at rest is not the same thing as being able to easily make near-instantaneous (or even slow) copies. Getting all that data on there is going to be a problem, and so it would only be economical for data sources in high demand, not unlike the way that optical media get the pits and lands stamped in at the factory based on an expensive master. "All of YouTube" might be (probably is) in high enough demand, but it would also require cooperation from the gatekeeper of that content, who at this time has adverse incentives because it's making (many times) more money selling ad impressions for basically every view that it's able to. Even aside from that, what other large datasets are in the same demand category such that the work could or would be subsidized in that way?

jaynetics · on Dec 28, 2020

I love this thought, but have some doubts.

So far, storage is still following something resembling Moores law, but it will probably hit physical limits way before a year of youtube (probably 30 of these new tapes or so) fits into your hand.

barbacoa · on Dec 28, 2020

HDDs are hitting physical limits, SSDs still have growing room. You can already buy 100 TB SSDs.

https://www.newegg.com/nimbus-data-dc-100tb/p/2U3-002M-00004

Following Moore's law of 2x density per 18 months, in 10 years we will see single drives of:

100 tb * 2^(10 years/1.5) = 10,159 TB

sergiotapia · on Dec 29, 2020

It's why I pirate at 720p x265, the files are tiny. I'd rather have a extremely large and varied Emby instance than 200 movies in Ultra Bluray Remux.

avdlinde · on Dec 28, 2020

Love that last line! Wonder what will happen drm wise in the future if we get to this point.