Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Storage Crisis (blogs.harvard.edu)
127 points by vikrum on July 14, 2021 | hide | past | favorite | 195 comments


I tried to back up a 60% full Ubuntu Linux RAID to a 4TB USB3 hard drive. I set the backup going and left it a few hours.

I was shocked to come back and find the RAID 100% full and only a few GB copied. It took me a few minutes to figure out what happened.

Maddeningly, the USB drive has a feature that it goes to sleep after 15 minutes or so, even if Ubuntu is actively using it to write files! Maybe the drivers on Windows do something that keep it awake, but on Linux it just goes to sleep in the middle of being used.

Now here's the thing, the drive DOES realize it has more to do and wake back up. But this sleep-wake causes a USB disconnect and reconnect. Which causes Ubuntu to unmount and remount the drive.

Now here's the problem, perhaps because the backup program is still making the the original mount point "busy", Ubuntu doesn't re-mount the media to the path. Instead it, gets mounted at "/media/path-1".

Since Linux uses regular folders as mount points, the old mount point at "/media/path" becomes a valid folder on the local disk. So the backup program keeps going, but now it's filling up the local disk.

I haven't found a solution for this problem that will allow me to complete a backup (or even complete a long-duration manual copy).


My personal opinion is that if the USB enclosure is that flaky then it's not worth trusting it to operate when it's time to do a restore.

You could shuck the drive and connect it directly or find a reliable USB enclosure.

Putting the disk in /etc/fstab by uuid should keep it mounted to the same directory at least, but I'll be surprised if the backup software properly handles the potential errors well. The following link has an example to remount the drive with udev when it goes away, and if the backup software isn't running as root you can mount with uid,gid set to the backup user (or chown an ext filesystem) but make the mountpoint directory 0700 owned by root to prevent the backup from writing to it while unmounted.

https://blog.backslasher.net/automatically-mounting-usb-driv...


+1 for shucking the drive. I got a few WD externals (14 TB). They exhibited the exact behaviour described by OP. Essentially the drive slept every 15 minutes or so.

Smartctl on a timer didn’t work. Touch on a timer didn’t work. I can’t recall if hdparm or some other tuning of sleep and head parking settings worked.

Taking the drive out of the enclosure worked. It got so much quieter too.


This is so weird. I have a few of the 10TB and 12TB WD EasyStores. I've been using them regularly for years and have never experienced anything like this.

Just to be clear, the drive does go to sleep when it's actually idle. Not while it's being used, though. IOW, they work as expected.

What distro/kernel version are you on? I mostly use Debian stable, which means a pretty old kernel. I wonder if it could be a regression in more recent ones?

I also use the drives with FreeBSD and have never had an issue there, either.


I'm also curious if it's aggressive USB power saving attempts with autosuspend. From my limited exploration autosuspend is usually off and e.g. requires powertop to enable (on Ubuntu LTS) or manual fiddling, but may be worth checking.


This was Arch without any special kernel, so a fairly recent one. It was a few months ago. Don’t recall the specific version.


> but I'll be surprised if the backup software properly handles the potential errors well.

YMMV but I've found rsync to handle this kind of fuckery well.


Or rclone if you need even faster parallel synchronization


I've used both of these tools with great success in the past. Infinitely scriptable in your favorite scripting language, being just simple command-line tools, too. Great for automation of backups.

Re; the issue of external drives spinning down causing a remount at a different location, I've never had that exact experience before, but I have had drives spin down more frequently than I'd like, so my answer to that was to just disable the spindown entirely. (There's articles about various ways to do that easily findable on your favorite search engine if you're curious how to do it.) Once disabled, if I find that I still want some form of spin-down for the drive, I install one of the handy available daemons that allows a much more easily configurable form of this feature, and then set a longer delay than the drive's default. (This can also be done by configuring the drive itself to a longer delay, but it's more of a pain than just editing a simple text configuration file for a daemon.)


I'm certain that this is a problem with the drive firmware or USB controller and not Linux per se.

I use several USB storage devices, enclosures on Linux everyday including using separate USB ssd caches.

Can you check with your storage device managing software on Windows on whether it has the sleep option or spin down set? I remember seeing one in the Seagate SW on mac although it has an OS level option to put HDD to sleep when not in use.

If that doesn't work you can try using hdparm or sdparm to modify the power management and spin down timer. If everything else fails then disable USB autosuspend in the kernel boot option.


Sleep is one problem.

The other problem is how the backup software reacts to the possible failure modes.

A backup tool should identify the backup destination in a reliable way and nountpoints is clearly not a reliable way to identify a backup destination


True but the parent didn't say how the backup is happening, It could have even been a rsync.


a shell script running rsync is "a backup automation software"


Alternatively, I have a Debian file server with 24Tb in internal drives and 3 x 8Tb external USB drives (WD EasyStores). The 3 USB drives are connected all the time and setup with MergeFS to appear as one massive 24Tb mount (ext4 underlying). I use rsync to do backups. The initial backup of 16Tb took many hours. About once a month I initiate a manual backup that often runs over an hour. I’ve never had this problem.


I have a Pi3 at my parents home and I rsync over ssh to a 5 TB 2.5" spinning disc attached via USB, I added the disc to fstab. The first sync took over 24 hrs for ~550 GB [0]. We both have 100/100 fiber, but USB is limiting in this case, which is nice because it kept both our connections usable during transfer. Anyway, I had 0 issues. Well, apart from the fact that I initially used vfat as the FS of the target disc so my partner or parents could just pull out the disc and use it (on Windows/Mac) when something would happen to me. But this leads to rsync issues and needs workarounds so I switched it to ext4 and told my loved ones to find a Linux nerd in the event of my demise.

[0] rsync -ahv --progress -e 'ssh -p2222' /data/0/Pictures bu.mydomain.tld:/mnt/data/backups/ --info=progress2 #(assumes cert based auth set up)


the controller in the enclosure is maybe overheating and either switching into a failsafe mode or a failure mode until it cools down again.

sustained data transfers crush those little controllers, so spend more on a better enclosure, actively cool the one you have, or just connect the bare hard drive directly to your PC if that is an option.

I've had 2 USB-3 to mSATA enclosures fail permanently after 5 minutes of sustained data transfers.


can you give the brand and model so we all avoid that?


I had related issues.

My solution since then was to work through a soft link to a directory inside the mounted drive.

Whatever the failure mode, when the storage disappears, even if the users do a “mkdir -p” (which seems to be the underlying issue you are describing)


Most linux tools have an option that keeps you from crossing filesystem boundaries


In this case he needs the opposite -- an option keeping it from not crossing filesystem boundaries.


I knew that for reads how do you do that for writes?


  chattr +i /mnt/foo
This makes the mount point immutable when nothing is mounted there.


OOooo, that's a nice trick! I'm totally gonna use that in a few places in my current setup. A little shocked at myself for never havin' thought of usin' the immutable bit in that specific way. Seems so very sensible and logical now that I see it. :)


Thanks, that will save me some serious headache!


I have a USB 2 external HDD enclosure that will drop off the bus if it is connected to a USB 3 port and under any respectable load. Only workaround I know of is to connect it to a USB port while giving the logo a stern, backwards-compatible glare.


This seems like it must be an issue with the drive you're using. I use a mix of Western Digital and Seagate USB drives and have never had a problem like this.


Huh, it's almost like treating separate volumes as part of the same file hierarchy is a bad abstraction.


Perhaps it is SMR drive. Ubuntu or linux is generally poor at sleeping USB. Ubuntu always mounts at /media/username/drivename - I am sure something is wrong with your setup.


Why do you refuse to blame Linux for this problem?


Because not everything that happens on Linux is Linux's fault, any more than every thing that goes wrong on Windows is Microsoft's fault? Sometimes it's user error, sometimes it's faulty hardware, sometimes it is the operating system. You don't just lay blame on the operating system for everything that's not working though (although I do see a lot of people do exactly that, despite it not being logical).

As an example; Many folks commenting above appear to have never experienced a problem with hard drives spinning down during a backup, or it causing a remount at a different mount point. This leads to thinking it might not be an operating system issue, but rather something strange about that specific external hard drive.


Maybe the data retention policy of OP is the real problem. Sound like a data hoarder/virtual messy to me.

When ever I take pictures on a trip or vacation, i go through them at the end of each day, delete most of them and keep maybe 2 or 3 max, beautify them and the rest goes into the bin. No matter how long the trip, i try to only keep at max the 15 best Pictures. All filler no killer. That way I am comfortable to show others pictures of a trip without boring them and I also like to look at them from time to time as I know those are the best moments.

At the end of each year we create a calendar with photo collages of 4 to 6 pictures per month. The calendar goes to relatives and we create a photo book from the print outs. That is what gets archived.


> Maybe the data retention policy of OP is the real problem.

This... this is a truly universal first-world problem. The average person did not need data retention policies until the past decade or so, and now it feels like I need a retention policy for literally every physical or digital object that enters the premises just to maintain my sanity.


You never threw out bad photos when you out them in a photo album? Seems like I’ve always faced the same data retention problem with photos, but it’s a lot easier to ignore it now and just keep them all.


With a roll of film people often tried to just take 1 or 2 photos of something. Now it's easy to take like 30 photos of some interesting thing happening. Then later how do you decide to delete 29 of them? What if a few of them captured some interesting thing you didn't expect, or if different things are accidentally in focus. It's maddening.


I am not a data hoarder, but I do know that the entropic cost comes with deleting not creating data. So if I take 15 shutter clicks to get the picture where the dog is actually looking at the camera, the most I’ll do is favourite the good one, but it’s too much work to do anything about the others.


Photographers have their photo collection as their life's work and source of professional value. It's viewed very differently from a personal photo collection and serves a very different purpose. They may need to access these photos in the future for many different reasons, for new clients or projects, etc


God: So how interesting do you think your life was?

Adam: ~15TiB.


There's a ton of easy low-hanging fruit along these lines that could be improved. Good example: I have a python script that routinely scans my ~/downloads directory and quarantines anything more than 24 hours old. It then scans the quarantine directory and deletes anything more than a week old. If the data is important, I'll have 2 chances to save it someplace meaningful.


This may not be directly related to your workflow, but in macOS there's a preference to delete items that have been in Trash after 30 days. If you're more proactive about putting stuff in the Trash (and less proactive about emptying it), that (plus backups) gives a similar behavior. My wife doesn't ever empty her Trash and previously I had written a script that did it.


I use the downloads folder to store the interesting pieces. It is fascinating to jump back in time and see what you have came across.


Same. For example my downloads folder has near every major release version of Starsector, Dwarf Fortress, and Factorio.


This is the way!

I recently got into film photography and the 36 pictures I get from the lab after a weekend or two make me so much more happier than 6 frames per second that my Fuji does.


There's a huge market here for something that professional photographers could use. I know several and they all tell each other to keep buying more external 1tb hard drives then back everything up to random cloud image hosts. Most of them end up with a bunch of zip lock bags laying around full of disks and thumb drives.

I made the mistake of building a NAS for one at one point but it turns out that you need to be really diligent about documenting the setup as rebuilding a raid5 array when it inevitably fails can be very difficult if you don't remember what the actual configuration was.

The crux of the issue is that they need massive storage capacity but whatever the solution is needs to be able to manage merging disjoint Adobe Lightroom catalogs while de-duplicating and without the possibility of data loss, all while needing basically no maintenance because these people tend not to be incredibly computer literate.


I'm a pro-photographer and work in IT as well. I have run into this storage problem on more occassions than I can poke a stick at. I've seen first hand accounts like the one linked, where people acquire hard drive after hard drive with no end in sight. But the problem isn't limited to just photographers, it's anyone who acquires data of any type.

However that being said, I use a combination of NAS of which I have about 40TB of pure photos (yes a lot of images). I've been dabbling with LTO to backup images, but you need to know what you are doing with tape. It's not a simple drop type opperation as a hard drive is. You can get a lot of shoeshining if you don't do things properly.

So for long term easy access, I recommend to others to look into 4-Bay NAS units (at a minimum) for future flexibility. QNAP or Synology to aid the config and use a blob storage service to store the data (if upload speed permits). I used to use Backblaze but even with basic B2 storage, I found it more expensive than Google Cloud storage (coldline).

But with anything your use case, preparedness to learn and financial situation will dictate what you end up doing.


Plus my experience is that my storage consumption increases more or less linearly whereas drive capacity is more exponential. I don’t see much of a reason to go beyond 4k for videos (that’s already way more than one can perceive on most screen formats, except perhaps for multiplex cinemas). And I am not sure of what is the point of a 100MP camera. You can find 18TB drives now. Very quickly the rationale for replacing drives won’t be that you run out of space but that they become old and too dangerous to keep around, or that you want the performance of SSDs, when they reach this level of price/GB.


I've got a 60MP camera (Sony a7R4), and I could see the point of a 100MP: extreme crops while still having acceptable image quality in the result. Doing bird photography after a certain point the size and weight of the lens becomes an effectively insurmountable issue (somewhere around the 600-800mm range, depending on maximum aperture) and the only way to get more "reach" is to be able to crop the resulting photo. So I'd never use the full 100MP, but I might discard 75% of it to get a nice print-quality bird image of reasonable size.

Of course you also need a really good, sharp lens to take advantage of such a high-res sensor. Any flaws in the lens or your technique or settings will show up more dramatically in a deep crop. So it's not going to save you money on a shorter focal length lens, it's just going to save weight.


Synology would be the obvious answer. It checks all those boxes and supports s3 or another Synology for remote backups with hyperbackup. You'll pay a premium over rolling your own but overall they're pretty darn stable.


Not to mention they keep updating older hardware for a long time. My parents have my old DS212j (2011) and it still gets updates.


>I made the mistake of building a NAS for one at one point but it turns out that you need to be really diligent about documenting the setup as rebuilding a raid5 array when it inevitably fails can be very difficult if you don't remember what the actual configuration was.

This is actually the reason why I bought a NAS for my primary storage at home. Just because if it crashes, I will have support or I can just buy another one with the same configuration and make it work.

If it crashes, the last thing I want to do is have to fiddle with settings around my data.


As a former professional photographer, and still a friend of many, I'm not convinced that it's a huge market. Actual profitable full-time professional photographers who can afford to buy equipment is a small and shrinking population. You could probably sell it to advanced amateurs too (who often spend MORE on equipment than real pros), but I still don't know if it's a large enough market to reach economies of scale and make the price reasonable.


> I made the mistake of building a NAS for one at one point but it turns out that you need to be really diligent about documenting the setup as rebuilding a raid5 array when it inevitably fails can be very difficult if you don't remember what the actual configuration was.

ZFS mirror pools is where it's at.

I can pop out any one drive of the pool and take it to any random unrelated machine and restore everything.


There’s a very funny video about that done with USB sticks. They remove all sticks then randomly put them back while the ZFS volume slowly comes back to life.


Would you mind sharing why these photographers can't use Dropbox/Photoshelter/etc?

Photoshelter looks to have a $50/month unlimited cloud storage plan[0], which seems pretty cheap if they're going to really take advantage of the "unlimited" aspect. For a professional photographer $50/month to not have to worry about carrying around this stuff seems like a deal (and less if they don't have that much to store).

[0]: https://www.photoshelter.com/signup/subscriber


I haven't done serious photography in years, but my guess is that the market isn't quite as big as the parent suspects (the people who want it aren't willing the pay). There's a giant graveyard of push-button photo backup sites targeting photographers. Sadly, I would be suspicious of using any service (since it would mean uploading 100s of gigs of images for the kinds of people who would use it) since they shut down so often.

I've never heard of Photoshelter, but it sounds like you upload high-resolution final images that are ready to view and print--it doesn't seem all that different from Smugmug or Flickr. Most photographers want to archive the source files and edits (project files). I see you can technically upload RAW files, but unless that integrates very well with your photo management software, I'm not sure if it's of much use. Lightroom databases can get relatively large since it includes caches and thumbnails. So you'd need a separate system for backing up those that ties it back to the RAW files.

Dropbox might work, but I'm not sure how approachable "smart sync" is and their pricing tiers don't seem like a great fit.

My guess is a well-loved open source project that wraps around b2 or AWS might be the best fit. But, honestly, a pile of hard drives will probably win out. You make your investment upfront and it's a simple enough mental-model. Unfortunately, they tend to die and with large data sizes photographers don't always go for redundancy.


What would this look like? Consumer grade tape devices? I've actually had a lot of external HDDs die compared to internal ones so I'm wary of using those.

My wife has a bunch of photos/videos for a small business and I have to admit we haven't been diligent in backing up. It's a lot of data to save the larger sources. And we also went down the NAS route, mainly for access convenience though.


This has been a problem for what? Nearly a decade now? I thought NAS vendor would one day figure it out. Nope. No one seems to give a damn.

Not to mention some tools to help with all the duplication we had on different HDD or USB Sticks.


You are confusing the market of people who say that want something with market of people who will pay for it.


If you want more reliable backups, Backblaze personal backup should never be on your list. The client takes anywhere between 1-8 hours to upload the index of files to the server after uploading the actual data for the backup, but it will say that the backup is complete before this happens and give a false assurance. So if your machine dies in that span (data uploaded but index not uploaded), then it’s as good as the backup not happening because the server has data blobs but no metadata, and cannot provide you a way to restore that. This is an intentionally designed behavior (for some other reasons) that adds risks that I belief nobody should take.

See this blog post and comment from nearly four years ago for a better description. [1]

And here’s Backblaze’s support page on the same, last updated a month ago (June 2021) indicating that this behavior remains. [2]

Any backup solution that doesn’t, at a minimum, follow the 3-2-1 rule will cause more instances of regret in the future. If the data is of value, it needs better and constant care.

[1]: https://mjtsai.com/blog/2014/05/22/what-backblaze-doesnt-bac...

[2]: https://help.backblaze.com/hc/en-us/articles/217665498-Why-h...


Assuming you have family living elsewhere but reachable through a fast internet connection you can do what I do by making a deal with them: they hang your backup box off their net and you will do the same for them. The backup box is some piece of computing equipment with storage media attached, e.g. a single board computer hooked up to a JBOD tower. Depending on the level of trust between you and your family you can use the thing as an rsnapshot target - giving you fine-grained direct access to time-based snapshots (I use 4-hour intervals for my rsnapshot targets which are located on-premises in different buildings spread over the farm) or as a repository of encrypted tarballs, or something in between. Allow the drives to spin down to save power, they'll be active only a fraction of the day. The average power consumption of the whole contraption does not need to exceed 10-15W making electricity costs negligible. You can have as much storage capacity as you want/can afford at the moment, keep for for as long as you want or until it breaks without having to pay any fees (other than hosting their contraption on your network - possibly including building it for them if they're not that computer-savvy).


> Depending on the level of trust between you and your family you can use the thing as an rsnapshot target - giving you fine-grained direct access to time-based snapshots (I use 4-hour intervals for my rsnapshot targets which are located on-premises in different buildings spread over the farm) or as a repository of encrypted tarballs, or something in between.

These days, openzfs native encryption is the best of all worlds, I think.


In that case, use that. ZFS seems to be a somewhat touchy subject with some people using it as widely as possible while others - myself included - prefer more modular storage systems where the tasks of volume manager, striping/slicing/raid management, encryption layers and file systems are performed by discrete software layers. Both systems work, both have their pros and cons, in the end it comes down what you value the most.


I actually agree with you that ZFS is a massive layering violation (from the user's perspective, at least; apparently under the hood it is a bunch of separate components layered together), but the features are good enough that I personally think it's worth it.


What's the cons of the ZFS approach?


ZFS straddles layers - from block device to file system - which normally are handled by discrete components. It does this in its own way which does not always fit my way. It is resource hungry which does not play well with resource-starved hardware. Growing ZFS vdevs is possible but it is not nearly as flexible as an mdraid/lvm/filesystem_of_choice solution.

In short, ZFS works fine on capable hardware with plenty of memory. It offers more amenities than a system based on mdraid/lvm/filesystem_of_choice. The latter combination can work on lower-spec hardware with less memory where ZFS would bog down or not work at all.


Unless you're using deduplication with ZFS (which you shouldn't) you can usually limit ARC size to a certain amount of ram and then the resource hungriness isn't an issue. You lose the benefit of more dynamic caching, and that sucks, but for lots of workloads this is fine.


Not on hardware with 2 GB or less of RAM unless you reserve most of it for ZFS and even then the feasibility depends on the combined size of the attached storage. Remember that we're talking backup systems, not full servers. These tend to be storage-heavy, focused on sequential rather than random access throughput, preferably low power - i.e. mostly single board computers like Raspberry Pi or older re-purposed hardware like that laptop without a screen, hooked up to a bundle of drives in some enclosure.

As a tangentially related aside I wonder why bringing up (potential) downsides of ZFS tend to lead to heated discussions where nothing like this happens when the same is said about e.g. hardware raid/$file_system or a modular stack like mdraid/lvm/$file_system.


Yeah, 2GB is pretty low... my own experience with backup systems is having full servers to power them, and usually having to over spec them so they can live 5 year lifespans without ever becoming the bottleneck in the backup chain.

re: tangent; I wouldn't really call my response heated, I've just ran into workloads with ZFS where limiting the ARC cache solved problems. I've also ran into frustrations with ZFS that there really aren't easy solutions to (The slab allocator on FreeBSD not playing well with ZFS on this particular servers workload, so having to change the memory allocation system being used to one that doubles CPU usage. Not having ZFSs extended ACLs exposed on Linux, meaning migrating one of our FreeBSD systems to linux will require serious effort)


With 2G of RAM you're talking about a $15 part. Bump it up to 4G and you've got plenty for 36T of zpool - I know from my own experience. Consider the price of the hard drives and it's a single digit % of the overall cost.


With 2GB or RAM I'm talking about the maximum amount possible in a number of machines in use here. Sure it is possible to buy new machines but why would I when there is no need? They work just fine as is with those 2GB, the only restriction is that they can not be used for ZFS since that needs more RAM. Since ZFS is a means to get to an end and there are different means to achieve the same end I just chose one of those - problem solved.


ZFS has worked well for me on 4G of RAM, with 10x 4T drives (36T zpool) in raidz2 configuration.

An old motherboard with 4G of RAM is not hard to come by.


Sure, but the OP is not talking about backup, they’re talking about primary storage.


Anyone who needs storage space for pictures cares about backup, they just may not be aware that they care about it or they may just conflate the two needs.


I settle for storing an external drive (not powered on) at family or friends. You could add offices, too, back when we had those.


I think I'm missing something obvious, but why is this better than keeping your tower at home?

Is this to guard against house fires?


House fires, electrical mayhem (lightning strikes have released more magical smoke than I care to mention here), burglary, flooding, law enforcement coming by to take your things because of $reason, earthquake damage or any other localised threat which can not touch remote backups.


I recently did a comparison of cloud storage options:

- If you want "personal storage" (no programmatic access) the options are generally around $5 per TB/month - but that's a lot of drag-and-dropping and praying the connection doesn't die during transfer.

- If you want object storage (think S3) the cheapest is $5 per TB/month, the average is $10 per TB/month, and the "high end" is $20 per TB/month, with extra costs for bandwidth ranging from $10 to $120 per TB/month.

Honestly, just store less crap. Marie Condo your digital life. Does it not spark joy? Have you not looked at it in the last 6 months? Does it not serve a useful purpose, such as tax records? Ditch it.

Even if you want to keep some pictures/video, either print a copy in original-quality, or compress/downscale it. I took a 3 minute video on my phone and it's 425MB. And it was still grainy! I used to download two-hour movies that were 725MB and looked like a DVD! errrrrrr... I mean, I heard about a guy that did that.


>And it was still grainy!

Because the grain is not from video compression but because the camera sensor wasn't receiving enough light so it boosted the ISO to compensate which causes grain. Also the fact that phones have to live encode video which is very non optimal while movies are shot uncompressed and then a powerful CPU can spend as long as it needs for a perfect compression.


The CPU in a real machine makes a huge difference: the last video I shot on my phone went from 3.9M → 1.6M, ~59% compressed. (With no noticable quality loss, but the original isn't winning any Oscars…)


And your phone is already pretty well off. Cheap dashcams spit out awful quality video with large file sizes since they don't have the benefit of expensive encoder chips.


OTOH with the unambitious encoding parameters they are very easy to cut with ffmpeg, or at least the MP4s produced by my helmet camera are. I mean, the cheaper the hardware, the fewer frames it'll be able to buffer, so I-frames should come pretty regularly.

I wrote a very simple video editor: video playback, some scrubbing controls on the arrow keys to jump around, and space bar to mark / unmark segments to keep (capturing timestamps). Run ffmpeg to cut the bits out then move source video to a different directory, ready to be deleted.

This cut down on my video volume 100x, while I still retained a few old commute videos, as a keepsake for that time in my life.


Also, noisier video will probably compress worse.


What many people forget with these x$/month is that you will have to pay this for the rest of your life. That's why I have chosen an option where you pay a higher fee once (about 3 years cost) and then it does not cost anything going forward.

Another issue with storing all these photos for your whole life is of course that once you die, what happens? Who will take charge of all these photos? Who will continue to pay? Who will clean up and take ownership of it? Or will they just be lost once you are dead and all this money for what?

The third issue is of course that the more random unsorted pictures we have the less we want to look at it because 90% are bad and only 10% are good. It's important to clean up just after taking the photos.


Head too far down this path of thought and you hit an existential crisis pretty quick though: "why are you storing photos?" "who will look at them?" "will anyone be looking at them in 200 years? 100? 50? 10?"

On a long enough timescale we all die and are probably forgotten. I've actually spent a decent amount of time resisting the urge to just delete all my old photos (I have things going back to early high school - taken on a Sony Mavica that used floppy disks).

There's a very liberating feeling to the idea of just having no history at all - probably similar to the appeal of the idea of having all your works crumble to dust when you die, which is also something I've seen people refer to at different times.

But, since I haven't done that yet, and since managing an on-site server or disks is it's own stress (i.e. eventually my house will burn down or flood and I'll lose everything...so what was the point?), then paying that $x/month is basically the price I pay to defer having to really think about it.


Regarding the cost. I still think it's a good option to use a cloud storage service or similar. But I just tried to say that it's way better to select something like pCloud's forever option, one time cost upfront, instead of paying every month or year for 50 years or more. After about 3.5 years, it's "free" storage after that.


Unless I am dead I feel pretty confident that I want to see my photos and videos of my kids growing up in 50 years time (it would make me 90)!


That's why I recommend physical prints. Hopefully you would put them somewhere where you actually look at them from time to time. Or put them in a safe deposit box (assuming you trust those).

The problem of "what to keep" isn't even digital-specific. In my youth I probably bought two dozen disposable cameras, had the pictures printed, even paid for the CD-ROMs when they became available. Where are those pictures now? Who knows! That's why I like Marie's message: it's not about throwing away junk, but keeping what you love close to you.


This becomes exhausting. Like, it becomes its own workload. There are some decent workflow tools for iPad that make it tolerable to work on while vegging out in the evening or on a plane, but it's definitely a chore to Marie Condo your digital life. And we haven't even gotten to the fragmentation of life as you move, move jobs, etc.


Agreed. $10 for some TBs is much cheaper than the time I spend deciding if I will need something or not.

Very similar to just buying a few more items at the grocery than I normally would. My time is so incredibly important that it's not worth $20 in wasted items to have to potentially drive back there if I've missed something.

If I don't have as much time as possible to spend on working/innovating/personal health and family, my income source will dry up and then I'm in dire straits ;)


Yeah, a few times I've tried to prune my collection of photos/ripped-mp3s/movies/whatever. I've always given up after I'm 1 hour in and still have like 98% of the collection to go.

It is kinda like how the old Yahoo-organized-like-a-library died out in favor of the Google-just-search-everything approach. Unless you're triaging as you go (which still takes tons of time; do people really go through 1,000 photos after their 3 week holiday to Iceland and keep just the 15-20 best?) it is easy to build up an insurmountable backlog where the only viable option is just "delete everything".

I actually did that with my old collection of ripped-from-CD mp3s and just resigned myself to streaming anything I really cared about. But you can't exactly do that with "family photos".


The problem with digital storage is the immense difference in filesize.

Removing those tax-forms isn't going to help you at all in freeing space, they are probably compressed XML or just PDFs. Yet, for example, that single slideshow for granma's 90th birthday gobbles up 95% of all the diskspace all your presentations combined use.

It is far more effective to hunt those large files and go through those only. I have largest.sh and baobab for that, but there are numerous other (GUI) tools for this.

Less fun, though. But after seeing my wife spend an entire afternoon sifting through her files (look! I already deleted 22 GB) and me going: did you see the 'Photoshoot 2012-04-01 waalkant (kopie)' full of Raws? That is 210GB. Can it go? done! Then I realised we need better tools and educate people a little here.


This is cool. I triggered on 'waalkant', then I saw your topbanner on Twitter, now I think I've been looking at your house last year from the back of a garbage truck a couple of times. But on topic: indeed, chase these big duplicate files first!


Wow, you probably have the best garbage-truck-run around here, then. Well, it probably is a lot of hanging and little garbage-hauling!


It was a perfect summer indeed. I called it a 'paid vacation'. Wave to my ex-colleagues in the white truck!


I just bought a pretty high quality 4TB internal HDD for like $120. Compress an archive and that can hold quite a lot for quite a while.

Next, I might need to build a little blueray changer to burn massive quantities of optical media.


Scaleway.com C14 cold archive is $2/TB/month with multi-hour retrieval time, like AWS Glacier. Retrieval uploads to a normal S3 object store that costs more but you don't have to leave it there long. Outbound bandwidth from the S3 store is .01/GB but alternatively I think you can retrieve through a Scaleway hourly VPS and then the bandwidth is essentially free.


A few people have raised this question in the thread already, but why are we compelled to store such unreasonable numbers of images?

I'm definitely not immune and I find that the satisfaction I get from my photo collection is inversely proportional to the size of my collection. At this point, I'm just lugging around this huge mass of data "just in case". There's no way I will ever have time to sort my images. There are probably 2000 wedding images alone, let alone the tens of thousands thousands of random snapshots that may or may not be something I ever care to see again.

At this point, I would almost prefer a smart solution opt-in solution similar to what Google photos provides for smart albums: "We found these images that would be good long-term. Save them indefinitely?"


Sounds like you are not organized. Are all your photos tagged with location, long/lat, date, and subject matter? Can you in 5 seconds or less pull up every waterfall? Photo of a loved one? Every rainbow? Every trailhead?

I reference mine quite often, usually for a purpose I didn't originally think of. The name of that cool restaurant in a city you visit rarely? How about how many double rainbows did you get last year in June? Pinning down dates from a previous pet. Or the funny pet photo with a squirrel in it? Where were you on March 2nd when your credit card got charged for $1202.20? When did you meet your new significant other 3 years ago or so? What trail head was it that you saw a bear? What is the serial number, VIN, or similar for just about anything valuable you own... or owned. How old where you when you won that race?

The cost of a store a photo for life isn't much, seems silly to spend hours and hours trying to delete them, doubly so because mistakes will be made. Just tag them so you can find what you want. I was surprised how much tagging helped. Grand mothers poured over ever damn photo I tagged with the grand kids. Starting collections of wild animals we've seen by state. etc. When bored I find it rather fun to relive a vacation or hike. Even my kid seems to quite enjoy "visiting" places I've been.

Sure 95-99% of my photos don't get viewed, ever. So? Why waste man hours on useless photos, just make sure you are organized enough to find the 1-5% that you do care about.


I have not met a single person in real life this organized with their photos. People I know upload albums to Facebook (!) and forget about them; they might post some really good/momentous ones on Instagram. It’s truly amazing to me (in a good way) that one could devote so much time to organizing photos.


Not sure what OP is using for their photos, but Google Photos handles all of those mentioned features automatically for me. I’m sure there are other similar apps that do the same.


If you've got some system and software combo worked out that brings keeping your photos organised like this in line with the time cost of deleting then I think many many people would be interested in a summary.

Think of all the grandmas you would indirectly be providing more grand kid photos to!


I personally use https://www.digikam.org/ to keep my own photo/image library organized and tagged. It's also easy to embed a bunch of that data directly into the image file metadata, too. Like you, I enjoy the ability to easily call up entire categories of images at will/need with a simple keyword/location search. Really handy when I'm doin' graphics stuff (Blender 3D, Inkscape, GIMP, whatever) and want a particular type of image for some purpose.


> Can you in 5 seconds or less pull up every waterfall? Photo of a loved one? Every rainbow? Every trailhead?

For me that was the selling point of Google Photos. I switched in 2016 maybe - prior to that using Photos.app - and it was mind blowing. Thanks to machine learning, if you didn't think you wanted to tag waterfalls at the time of capture, it would still let you find them (I have 12 waterfall photos).

Unless you are a professional photographer, why waste man hours organising them when some algorithms and machine learning can do a good enough job?


What software are you using for organization and tagging?


What software are you using for tagging?


For photographers these images are their life's work and professional value. They need to be able to go back and reference a client's photoshoot or pull up a higher resolution file or retouch a photo or come back to an old photo and use new software to re-edit it to perfection, or find related photos from a photoshoot of interest in the future, etc....

It's very different from when a hobbyist takes photos just for their personal fun


No way I am going to trust today's guessificial algorithm. The capacity/cost does scale as the file size increases, and the algorithms get smarter. Photos can increasingly be analyzed for useful correlation. Today we can find by time, place, people, object, eventually scenes will be reconstructed, lifestyle patterns contributing to long term health outcomes, leading to real diagnosis, memory augmentation and simulation (much of it requiring better privacy). In the meantime, I enjoy being able to time travel to amazing or obscure moments (even if it's just that amazing abricotine in 2016).


I back up about 15tb of photos with Arq to Wasabi cloud storage.

It has been running with zero maintenance (other than occasional partial restores) since late 2018.

I transitioned off of AWS cloud storage when they raised the prices.

I’m not sure if Wasabi is still the cheapest and fastest, but they have been great to deal with. And Arq is an excellent set-it-and-forget-it encrypted cloud backup app.

I also run a server with a 16tb RAID 1 array and a set of local backup drives. Sadly it is almost full, and the volume of data makes it a hassle to upgrade (not to mention the cost).

I’ve found standard 1Gig Ethernet to be just barely fast enough for editing photos over the local network. However, for my own sanity, I usually do the initial editing on a local drive before sending the files to the server (and from there to the cloud backup).


Note that he's looking for primary storage not backup.


How do you deal with merging Lightroom catalog files if they are all indexed remote? Sounds like a tricky workflow


I think $6 a month for "unlimited data" is going to end badly.

The AWS or Azure price is high, but it's a scalable price, a real price.


> I think $6 a month for "unlimited data" is going to end badly.

Backblaze has been offering their 'unlimited data' for over a decade [1], and it hasn't ended badly for them yet.

It's fully sustainable because they only lose money on a few customers (like that one customer storing 430TB for $6/month [2]), while most customers use much less storage than that, so the service remains profitable overall.

The soft limit of its 'unlimited' comes from it being a personal-backup mirroring service and not a fully-external cloud storage, so the service only fits certain use cases.

[1] https://www.backblaze.com/blog/all-in-on-unlimited-backup/

[2] https://www.reddit.com/r/IAmA/comments/b6lbew/were_the_backb...


Backblaze is cool, but their limitations to make $6/month affordable on their end essentially eliminate any large backups. You must have any data locally connected to your PC or it will be purged if not seen within 30 days (so you can connect a bunch of USB drives locally, but this is a hack).

Also their upload speed is capped so you can't just upload at 1GB/s or max out your connection to ingest data into their system, like something you can do with S3.

Glacier Deep Archive is still the cheapest thing really at $1/TB, but the retrieval times and also egress data charges are a big catch.


It's not nearly as cheap as Glacier, but Backblaze's B2 is $5/TB (and $10/TB to download), putting it between S3 and Glacier. I don't know if the upload speed to B2 is capped; my internet isn't fast enough to tell.

That might be prohibitive if you have a huge NAS to back up, but for moderately large amounts of data (say 5 TBs or less), it seems pretty reasonable.


Yeah, they also don't have a Linux client for the backup product. While you certainly can build a Windows-based storage server, and there's even some interesting storage tech in Windows, most data-hoarders store their data on Linux.

Glacier/Deep is good, but with the 180-day minimum object lifetime, you want to be sure that the data is ready to go into the archive before pushing it there. (You can use tiered storage, but then you're storing all data in standard S3 for 30 days before it gets into Glacier, and that one month of storage in standard S3 will cost you the same as 10-months of Glacier storage.)


> Glacier/Deep is good, but with the 180-day minimum object lifetime, you want to be sure that the data is ready to go into the archive before pushing it there.

One of my (to be implemented) backup strategies was to use Glacier Deep Archive as last-resort recovery, and just stick yearly tarballs (e.g. 2018, 2019, 2020) in there. That should save me a bit on retrieval requests as well.


I've built a production line for what i call "three-sided cards" that usually have a printed front and back and a supporting web site linked by QR code and/or NFC.

The web site is hosted through AWS S3 and the Cloudfront CDN. The "web side" is carefully optimized for storage and transfer costs. (Glad webp is good-to-go, I am hopeful JPEG XL rolls out fast.) With Glacer/Deep I can afford to archive the Camera RAW, superresolution PNGs and the other assets that go into the print sides just in case I lose my workstation and my storage server.


That would be super attractive to me if my ISP didn't have data caps. My personal archive is ~40TB of drives in my primary desktop. Even disregarding empty space and duplicates, it would take four years or more to upload it onto a cloud service without running over my data cap.


I have symmetric gigabit ethernet and have a few TB backed up to deep archive, but if I wanted to backup and restore everything the data transfer pricing is insane, it would be $1800 for 20TB. This really is the only thing AWS is keeping artificially high to facilitate lock-in.


Can you pay to remove the cap? If you can you could pay the extra to remove the cap for however long it'd take to upload everything.


Another option is to drive the data to some place with a fast internet connection and do the upload over the course of a day or two (one time I asked a friend if I could bring my laptop and a hard drive to his university classroom and plug in for an afternoon (I was lucky to know a few of the IT folks there), and I got 940 Mbps upload. It was like heaven!


Only at ten dollars per fifty gigabytes, which is obviously impractical at this size.


So considering they say that customer costs more than 2k dollars, that means that the average must be ~1/1.5TB for them to be profitable. That's a lot and not much at the same time.


Backblaze has limitations that doesn’t make it a backup service and since it’s not a sync service either it’s something else. Something that works for some people. People who are fine with all their conditions and gotchas.

For me Backblaze Personal is anything but a backup service.


Yup, I'm currently experimenting with using an AWS bucket alongside [CloudMounter](https://cloudmounter.net) to create an Archive disk for things I don't really need day-to-day, but would like to hang onto.

I'm currently not backing that data up, though. It's not critical, and the chances of AWS losing it are low enough that I'm not too worried. The biggest risk would be myself accidentally deleting it.

One thing I have decided for sure, though, is that archiving lots of small files is awful. Much better to wrap them up in a tarfile, uncompressed.


Amazon Drive is fine. No need to have a mirror of all that data on my computer. With that I could have selected sync or no sync at all. + unlimited photo storage. Amazon Photo and Amazon Drive it’s like a single product.


I've used Amazon Drive back when they offered their unlimited product, and they've lost/corrupted some of my files before.

I personally wouldn't recommend Amazon Drive as a backup destination to anyone.


It's not "unlimited" in a practical sense because as the post said, they're just mirroring the data so you yourself have to have the storage capacity you're asking them for.


There might be a way to trick the client synching the data into believing that it's still on disk by looking at how it checks if the data is still there and writing a filter driver to intercept and modify the result of the call.


Good luck getting a checksum for a partial file that you don't have


True, but perhaps the checksum doesn't need to be computed if the timestamp didn't change. Or perhaps the checksum is always computed in the same way so you can just store the checksum.


Store the checksums? You still need local storage but way less.


It's $7 a month now. They just announced a price rise.

https://www.backblaze.com/blog/subscription-changes-for-comp...


The bandwidth cost makes me cry.


10mbps up will upload about 30GB in 8 hours overnight while you sleep. Most people are not taking 30GB of new photos every day. Or using their Internet connection all day when they are awake either. So cloud storage is totally viable for most people with 10mbps upload speeds.

Now, a person might have a big backlog of data. This guy has 7TB of photos. If you start uploading, you’ll get caught up eventually! If you want to get caught up faster, you just need to find a faster pipe and park your computer/HDs on it for a little while. For me, it was a friend with FiOS. For many professionals, it might be their office.

The point is, once you’re through the backlog, 10mbps is probably enough to maintain. Again: for most people.


You might be forgetting about data caps. Comcast for instance has a 1.2TB/mo cap, graciously upped from 1.0TB last year.


Indeed. And upload speed caps, as well. My 100 megabit downstream only allows 5 megabits upload speed for example.


Indeed! Mine is advertised as 16Mbps which isn't great to begin with, but I can only burst to that for a few seconds after which it quickly drops to something like 3-4.



Lots of people talking about photo retention policies here, but for me the issue is videos. We have little kids, and that means lots of cute videos. It's harder to scan through them and see what's useful versus what isn't. I try to remember to delete videos that didn't turn out when I take them, but that doesn't help with the videos that I shoot that are 2 mins long but only have 20 seconds of something worth keeping.

So far I've been able to get by with retail HDDs; I have a 5TB drive at home and another that we keep off-site and refresh a couple times a year. This seems to be a sustainable strategy for me, as affordable (~$100) portable HDDs are growing at a rate that is faster than my storage needs. I don't know if it would be cheaper to pay for iCloud, but for whatever reason I feel safer having two HDDs with my media than trusting Apple's (or anyone else's) cloud.


> I don't know if it would be cheaper to pay for iCloud, but for whatever reason I feel safer having two HDDs with my media than trusting Apple's (or anyone else's) cloud.

I'd suggest paying for iCloud just for the ability to automagically backup your iPhone(s). You can exclude Photos from being backed up if you prefer.

I use iCloud Photo Library as well on a larger plan, but the backup feature alone is definitely well worth $0.99.


Would be nice if AI could suggest what to cut, make it really quick to clean up a video


I can't imagine AI would be able to tell what constitutes the cute part of a video of a little kid, but it would sure be nice if I could just tell Siri to "delete all but the last 20 seconds of this video". It's not that manually cropping is that many steps, it's that on an iPhone, it's hard to get the crop tool to work reliably (or my fingers are just really fat).


I ended up buying a 4 disk synology NAS unit/appliance for about $400 a couple of years ago. It's been remarkably reliable, and has a built-in feature to backup to Glacier, although there's third party tools for most any online data backup system you can think of.

Prior to that I had this monsterous, 4U server that I had to maintain, update, debug, not to mention build and move from house to house, for 8 years. At the time it seemed like a good move, but over time, as my time got shorter to work on personal products, I started to look for other ways to solve the storage problem.

I much prefer the Synology NAS to the 4U custom built system. It's about as close to a toaster-style appliance as I can think of. If it died tomorrow, I'd buy another one (then restore from my Glacier backups).


This storage talk abut hybrid drives sounds a lot like 2010. This isnt new and SSDs are really not much more expensive than your regular HDD and they offer much more benefits. Its just that Apple charges you for 1 drive the price of at least 4 same drives if you were buying it on your own.


This is for long term storage.

SSD has 0 benefits over spinning rust. and is still more expensive.

I can buy decent spinning rust 10TB disk anywhere, if i want that in ssd's, id need 5 drives, or start hinting for enterprise grade equipment.

So for me for raw storage (if you need more than a couple of TB's) spinningg rust is still the winner.


If you don't want to pay for the cloud, then there is only one sane option: You keep upgrading your local storage so that you never have more than a few devices. Every six months, buy a new giant disk and decommission as many smaller disks as you can.


This sounds extremely expensive


Cloud storage is really expensive, though. Putting 4 TB in S3 costs US$94.21/month before any transfer costs. A 4 TB CMR HDD is ~$140. That doesn’t include power costs, but over a year you get $990 to spend on that and other things from the cost difference vs. S3 (plus you can sell your old drives when you’re done with them).

(I use B2 for cloud storage backup since it’s a lot cheaper than S3 at US$20 for 4 TB, but that still is ~$105/year more than local)


> 4 TB CMR HDD is ~$140

And that's with bad pricing! Last year, 4TB was $120 and 6TB was ~$150.

Building out a NAS for $1000 ($600 in 4x hard drives, $400 for other components) is very reasonable. Last year that was 4x6TB == 12TB storage + 12TB redundancy, but this year prices are worse so you "only" get 8TB + 8TB redundancy.

$400 can afford a Synology or various NAS devices. It can also afford a new desktop that you can install FreeNAS or whatever onto.

-----------

Eventually, when the 8TB is not enough, just buy a new HBA card and shove 4x more hard drives in there for a 2nd storage on the same NAS. Maybe 8TB x 4 == 16TB usable + 16TB redundancy.

Except no need to copy everything over, just keep the old 8TB cluster working, and just start writing to the new 16TB storage.


This is the true answer to this entire thread.


> Putting 4 TB in S3 costs US$94.21/month before any transfer costs.

If you have 5TB of photos, chances are you're not looking through 5TB of photos all that often, and S3's IA storage is very appealing at ~$65/mo.

If you only want a cold storage backup, glacier will keep them for you at $20/mo. If you need storage that's only accessed once or twice a year (and you don't mind waiting a bit to get your file) you can pay less than $5/mo with Glacier Deep Archive.

The cost of S3 that you quoted is for _nine nines_ of durability and milliseconds latency. If you don't need that, you can pay for far cheaper storage that better-matches your needs, with the convenience of never needing to buy/transfer/replace/sell drives.

> That doesn’t include power costs, but over a year you get $990 to spend on that and other things from the cost difference vs. S3

If you spend ~days each year worrying about storage and paying for ever-larger disks (and spending time selling your old ones, for whatever you can get), chances are the few hundred dollars you might come out ahead doing it yourself isn't really worth it.


Cloud storage is doing a LOT more than what you describe: for example, to compare with S3 you'd be budgeting for at least 3 disks in geographically separated locations running a secure always-online service with things like guarantees for immutability and bit-rot detection/prevention. For backups, you probably also want to compare things like the infrequent or cold storage tiers.

You can still beat that, of course, but it's assuming you have time and skills to do so and don't mind spending that time playing sysadmin.


Cloud storage is also really different from a hard disk though, mostly in the direction of being really better than a hard disk, so it being more expensive is not surprising.


The advantage of cloud storage is that it gives you a remote backup, which a local NAS or external drive doesn't.


It's not prohibitively expensive; all the large internet/cloud companies upgrade machines and drives on a ~3 year schedule and mostly keep the data online. Storage Moore's Law means that every upgrade let's you put the old data onto fewer new drives. Storage Moore's Law is slowing, but still makes this the most affordable option when taking durability into account.

Offline storage is dangerous. Readers go obsolete. Media degrades. Data sets go missing. Protocols and storage formats become obsolete. Migrating data from one online storage to another while upgrading hardware solves a lot of these issues, the only exception being storage format for straight file copies between systems.


it is, and it is over the top.

Moving up to larger drives every 6 months is insane. Maybe every 3-4 years. Depending on your actual storage requirements of course. Not sure why OP's requirement is to have "few devices".

Local storage is always cheaper than cloud storage unless you're doing it really wrong. Fully (3-2-1) backed up local storage less so, but if you do it smart it doesn't have to cost an arm and a leg.


"Six months" was just a random number. You get the idea.


The real question being, of course, given that your life is finite ... how often will you actually look at any of these pics.

The more pix there are, the less likely it is you'll ever even open any of these.


> The real question being, of course, given that your life is finite ... how often will you actually look at any of these pics.

> The more pix there are, the less likely it is you'll ever even open any of these.

Well, there are also future generations to think about, but their interest will fall off too (until you reach the genealogical profile level, which maxes out at a few portrait-type pictures of any regular individual).

It's pretty essential to aggressively curate and organize data like this.


Dunno, tagging seems worth it, deleting not so much. Storage is getting cheaper ever year. I've looked through a many albums from parents, grand parents, and great grand parents. I hope that my pictures make it that far, but seems like the world needs a "free" (except for storage/bandwidth) photo backup service to have any chance of surviving multiple generations.


> Dunno, tagging seems worth it, deleting not so much. Storage is getting cheaper ever year.

The way I think about it is that you need relatively small "SAVE THIS" drive/disc. Sure, storage is getting cheaper, but you don't want to burden someone with some massive storage array, and you can't count on whatever software you used to tag to still work in someone else's hands.


Dunno, can you guess what will be important to the next generation? 2 generations? 3? You might play a small part of important events, people, etc that will not be realized later. Even silly things like the first robin of the season, or a picture at the pool on the hottest day of the year.


For me it's the case that I'll need something from my archive. But I don't know what I'll need, so I just store it all. I may only actually need a few files, but I don't know which few.


I think there is another point worth mentioning. The Data we own or create are increasing. Taking more photos than ever, higher resolution, Video in 4K or 60 frames. Friends sending their media over. So not only does the usage increases, the actual size of those unit increases as well.

And yet Hardrive price / GB hasn't fallen much at all. From 2013 to 2021 now, when it was $0.035 to $0.04 /GB. Mostly because much HDD patter density has stalled.


I previously had a 4-bay Synology and loved the UI but couldn't deal with the price jump to an 8-bay. I discovered xpenology.org which allows you to bootload Synology software on x86 hardware.

I turned an old desktop PC into an 8-bay storage machine with a PCIe SATA card and now have all the flexibility I want, without the price penalty for Synology hardware.

The best place to find hard drives is https://diskprices.com/ which scrapes Amazon and a few other sites to show TB per £/$. It's almost always cheaper to buy a retail packaged external disk and extract the hard drive.

Note: Some hard drives once shucked from external cases won't turn on when attached to regular SATA power adapters due to mismatches in power connector specification. Easily solved by using a SATA power to Molex adapter, and connecting that to a Molex to SATA power adapter (SATA->Molex->SATA) as Molex does not support the 3.3V pin which is the issue.


Spend a few hundred to a few thousand buck and build a chunky NAS. Get a bunch of 12TB SSDs and put them in a RAIDZn.

Also, what shitty phone takes 108MP photos? That's guaranteed to be some stupid Android phone gimmick. There's no way having that many pixels with a teeny optical path and a teeny sensor is useful. I'd only want above 100MP on a medium format sensor.


And then what happens WHEN the NAS fails?

And if the answer is to buy 2 NASes, what happens if something physically destroys the location. Fire & theft isn't all that uncommon over a lifetime. Or if you (probably a more likely) accidentally format or delete its contents.


It’s quite unlikely that a RAIDZ2-3 system fails in a way that you lose your data (short of catching on fire), but I also have offsite snapshot backups at rsync.net. 10mbps up is plenty to keep that synced, even if you write hundreds of GB per day.


What do folks typically use for storing media you don't access very often, but when you do access it you want it relatively quickly and at high transfer speeds?

I like to backup my UHD blurays losslessly and stream them from my PC via plex (~100mbps). I'm thinking local storage via a NAS is probably the cheapest option here, right?


NAS, definitely. Even hard drives are a ton faster on a local network than most people's internet connection over a cloud service.


i am using this jpeg recompressor:

https://github.com/danielgtaylor/jpeg-archive

Tried a lot of combinations of quality, and settled for -q medium --max 75% --accurate

Anything higher makes no sense/difference, even if you print it at A3. And my images are 8-18-24Mpix.

i'm not a pro-foto but somewhat of an old pre-print school - Your level of pickiness may vary :/

Effectively it is like 4-8 times down - Total went from 150Gb into 30Gb - that's about 40,000 images.

Which still doesn't solve the OP's or anyone's problem of too-easy-producible digital "assets".. but is more manageable.


I guess it depends on how much one values "archiving", but WebP has a lossless option [1] that might be a good alternative to the above.

[1] https://imagemagick.org/script/webp.php


What are the requirements? Large capacity, redundancy, reasonable access speed, always on but maintenance downtime of even a few hours a year tolerable?

Easy but your data isn't yours: sync your data to GDrive or Apple or whatever, and sync a NAS to that.

A little harder but still doable: get a Hetzner and set that up as your storage, set up your own access, sync to local NAS. A Hetz is also really useful for running a load of other services, so for 50 bucks or so it seems pretty reasonable.

You could just buy a huge disk and run a server in your house, but it gets annoying in various ways. Kids unplug the power, it creates heat, maybe noise, multiple disks end up needing management, that kind of thing.


How much of the actual information in the high resolution photos is just noise?

So we buy sensors with more megapixels that just save more noise. Then we need to buy more storage capacity, more bandwidth, more processing, more battery etc.

Since the cameras have been hitting closer and closer to physical limits but something still needs to be sold...

It's like people buying clothes from the mall and taking them straight to a storage building. They don't have enough space at home. And they won't have time to wear so many clothes anyway.


> The best new phones come with the ability to shoot 108 megapixel photos, record 4K video with stereo sound, and pack the results into a terabyte of onboard storage. But what do you do when that storage fills up?

A good starting point for me is to not keep everything I shoot. I have often noticed that every 50-100 photos I shoot I only want to keep 5-10. Rest are bad/weird shots, pics I’m not interested in etc.

I never faced this when I used film.


Almost everything I create doesn’t really need to be saved and I could never find it if I needed to. I just let Google Photos keep them. It’s good enough.


I use borgbase (https://www.borgbase.com) for my online backups. It took a few days to make the initial upload, but now it's it only takes a couple of minute every hour. And you can select the pruning strategy you want.

I haven't had to use it yet, but I check the files from time to time, and so far I'm satisfied with the service.


I used Backblaze B2 as an offsite backup for RAW digital film. 1TB took about a month to completely backup, which is not ideal.

B2 worked great and the pricing is indeed excellent, but the volume of raw data to backup over broadband is just not realistic.

I think it would be better to physically mail a drive for archiving raw footage, or use tap drives.


I am faced with this issue of dealing with hundreds of GB of family video and pictures everytime I upgrade my wife’s phone.

The newer phones always have a higher resolution camera so file sizes keep increasing.

I would love a simple easy to use tape drive that just works. Something I could backup files to for 20-25 years.


I can't fathom how you get to 7TB of photos without being a photographer, or recording video all the time. Granted I haven't been too prolific in the last few years, but my entire collection from 3 DSLRs + iPhones going back to ~2005 totals 300GB. All backed up in B2 + google cloud.


> Granted I haven't been too prolific in the last few years, but my entire collection from 3 DSLRs + iPhones going back to ~2005 totals 300GB.

You don't keep the RAW files from your DSLRs? That'll bring the total size up pretty quick.


The short answer to "can I replace physical storage with cloud storage?" is no.


The solution is either NAS or cloud archive (not backup) such as SmugMug or ExpanDrive.


I’m still happy with my setup of Drobo + 2 rotating time machine backup HDs. I’ve probably been using it close to 10 years now.


I recommend looking into decentralized cloud storage solutions, which are really the next generation of AWS S3-like services.

Check out storj.io


Decentralization adds a good deal risk and operational overhead but it doesn't really change the core problem that people are generating more data than many want to pay for. If you want to store a non-trivial amount of data, someone needs to get paid to maintain storage pools and validate multiple copies.


I've always wondered why nobody does p2p backups.

Take whatever the backups are, say 1TB, split it into 12 pieces, add as many pieces as the user wants (say 6 on average), then you can recover with any 12. When you add another 1TB, add another 18 peers.

Monitor your peers, only trust the ones with a track record of successful challenges, and of course let you white list peers you trust (like friends and family). Of course the perfect challenge is just a restore, but some checksome of a range of a blob could be useful as well, and consume less bandwidth.

Drop peers that are unreliable, or ask for too much bandwidth for their restores.

Encrypt the files before adding reed solomon, use a unique encryption key for each peer that stores 1/Nth of your backup set.

That way you can "pay" for your storage by just adding that much more local disk space to trade with your peers.


A couple problems off the top of my head:

1. Everything is harder to work with: you have to deal with less reliable networks and storage, computers which aren't on all of the time, very slow uplinks, etc. Since these aren't professionally managed systems, too, you have less visibility — did that node just drop offline because the hard drive failed, losing everything, Comcast is having a bad day, the owner just bought a new one and wiped the old one without unenrolling it, or because it rebooted and is almost back up?

2. People are selfish: I don't want Netflix getting slow because you decided to retrieve your data, I'll complain if I hit a storage limit on my computer due to your stored backup data, etc. This forces you do deal with things like traffic shaping and storage rebalancing more aggressively and those are hard problems to get a popular balance on. Consider, for example, what happens when someone uses your service and it goes well but then they run out of space and need to clear some up in a hurry (_especially_ if they put on their cowboy hat and just delete a bunch of large files because they know they aren't the only ones with a copy).

3. The solutions to the previous problems make the cost problem worse: storing more copies can avoid some of the problems but then you need to figure out how to get the network to support, say, 5 copies instead of 2-3.

4. Consider what happens the first time the police bust someone for a major crime and their data is backed up on your computer. Not many people are enthusiastic about going into court to prove the negative assertion that they didn't have a decryption key.

Only peering with trusted systems avoids some of these issues but not all and the _big_ problem is that the upper bound for how much this service is worth is basically the cost of iCloud/Dropbox/S3/Backblaze/etc. The savings you can get between the fixed operational costs and what those services charge is probably not enough to support development.


It's been tried plenty of times. P2P software is really complex but the use case is people who aren't willing to pay, so maintaining the software isn't sustainable.


I remember when my life's professional work fit on a CD. It was a strange feeling.


Isn’t this the business model that Dropbox, iCloud and Google Drive, etc. are based upon?


I want to replace those external drives with cloud storage. Is that possible?


Never heard about cloud storage?


Hoarding is a mental illness


> Hoarding is a mental illness

No, hoarding is a behavior which may or may not, in any particular case, be a symptom of one or another mental illness.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: