An upside-down backup strategy

rsync · on Sept 28, 2022

If you'd like to start using rclone, and you should, this is a high quality and general howto:

https://rsync.net/resources/howto/rclone.html

The example is S3 <--> rsync.net (naturally) but the step by step instructions are applicable to setting up any combination of "remotes".

rclone is like youtube-dl - a powerful tool that seems almost magical.

caseyohara · on Sept 28, 2022

rclone is indeed magical. I recently migrated a customer's business from Box to Dropbox, using rsync.net as an intermediary to save my own bandwidth. rclone and rsync.net work very well together.

rsync · on Sept 28, 2022

Can you email info@ ?

I would love to write down your recipes/commands and put them up so others can see them - box -> rsync.net -> dropbox is an interesting workflow ...

caseyohara · on Sept 28, 2022

Absolutely.

gregsadetsky · on Sept 28, 2022

I would love to be able to cron rclone (e.g. have it run automatically) on rsync.net :-)

- a happy rsync.net customer

rsync · on Sept 28, 2022

We can do this. Just email.

ahupp · on Sept 28, 2022

Note that the Google Photos API which rclone uses for gphotos sources does not allow downloading original quality images (only re-compressed versions), and strips EXIF. So I personally don't use it for backup purposes, instead I:

1) Sync my Google Drive to a local Synology NAS

2) Periodically request a Google Takeout, which dumps the entire contents of my Google account (including original photos) into Drive.

3) A nightly script unzips the takeout archives so they can be picked up by an incremental backup to Backblaze B2. The NAS also does local snapshots.

This has some nice properties:

- There are 3 distinct copies

- If a file is inadvertently deleted I can restore from either local snapshot or B2.

- If my google account is nuked I still have an archive of mail, photos, etc. Though that's limited to the frequency of my Takeout requests.

- Since the NAS is the single point for backups it's a natural place to put non-cloud files, or Dropbox etc and have them automatically picked up for backup without setting up anything new.

marwis · on Sept 28, 2022

Sadly, photos takeout breaks if you have too many files. Then you have to setup multiple partial takeouts (each with some subset of albums) which is a huge pain.

I used to use OneDrive as a 2nd backup but Microsoft started doing same crap (removal of EXIF).

apitman · on Sept 28, 2022

You can get full quality photos of you're a paying Google customer right?

ahupp · on Sept 28, 2022

It's an API limitation afaik, not tied to whether you pay.

bombcar · on Sept 28, 2022

One thing I'd add is make sure you're snapshotting via ZFS or something similar, because the most likely cause of dataloss is accidental deletion, and if your synced copies are perfect replicas, they'll replicate the deletion, too.

rsync · on Sept 28, 2022

If I could speak directly to the op I would recommend flipping back to a "rightside up" model and doing dumb, mirror, 1:1 backups to rsync.net and then configuring a day/week/month schedule of ZFS snapshots on that end.

Those ZFS snapshots are immutable / read-only so they not only serve as retention, they protect against Mallory (and ransomware and malware, etc.)

proactivesvcs · on Sept 28, 2022

Synchronisation is not backup. Almost every restore that I've performed for myself and others resulted from accidental deletion. Sync propogates accidental deletion. It also doesn't make your data resilient to malware, bit rot or removal of data for TOS violation.

teddyh · on Sept 28, 2022

“Just as we respect and care for our ancestors, so we must respect and care for our old backups, for one day they may achieve great glory.”

— http://www.taobackup.com/history.html

proactivesvcs · on Sept 28, 2022

LOL! Sage and hilarity in one small story. Reading through the rest now :-)

m463 · on Sept 28, 2022

> Synchronization is not backup.

also RAID is not backup. :)

also related: https://www.jwz.org/doc/backups.html

teddyh · on Sept 30, 2022

From your link:

> RAID is a waste of your goddamned time and money.

Backups are important, since they protect against most problems, whereas RAID only protects you from exactly one thing:

On RAID, when one disk has a hardware failure, the system keeps on running. In many cases, if you take appropriate precautions, you might even be spared a reboot when you unplug the failed disk and plug in a new, working disk. Otherwise you’ll simply have to shut down, quickly unplug the failed disk, plug in the new, working disk, start up and it’s all good again. You can even put off this downtime to a later, more appropriate time of the day. When the new disk is in the machine, the sync to the new disk can be done while the system is running, and the only system downtime is for physically exchanging the disks.

Without RAID, the hardware failure means that you:

1. Lose all files and changes since your last backup, and

2. Have significant downtime while you restore from your last backup. If done over the network, this could take a very long time, but even if restoring from local disk, copying takes a lot of time. While copying, your system is down.

Whether this means that RAID is a waste of your time and money depends on your personal tolerance of points 1 and 2.

GordonS · on Sept 28, 2022

> Sync propogates accidental deletion

This is true, but it's so worth noting some sync products (e.g. Seafile) can keep a history of changes, allowing you to restore deleted files.

Sync is not backup though, as you allude to.

jermaustin1 · on Sept 28, 2022

This is what I do. I sync to my NAS, my NAS syncs to my Backblaze B2 with file versioning. My current monthly bill is only $4... I cannot believe how cheap Backblaze has been.

I can even get the B2 URL and share a file anytime I want from inside my NAS.

Is it as easy as Dropbox? No, but I haven't missed using dropbox. I just mount my NAS to my RDP sessions when I need files "synced" between my various computers.

Groxx · on Sept 28, 2022

I've been eyeballing this kind of setup lately, because Dropbox has become so hostile to simple use. Hours of delay between "changed small text file" and "synced", endless "bugs" that push things to cloud-first storage rather than the synced folders I have set up, and using more and more resources with every upgrade. I'm so sick of their business-oriented shift.

</rant>

Got a favorite NAS that works well? Without the risk of the vendor's cloud deciding to delete your data? I haven't yet picked one, beyond "what if I just used a Pi and some USB drives..."

jermaustin1 · on Sept 28, 2022

I use Synology. Its not the best, but it Just Works and got the "Shell" for only $120, plopped in 2 8TB spinners, and haven't really touched it other than moving house 3 times.

If I was going to do it over again, I would get one of their commercial grade NASes/ They are a bit more expensive but have a decent CPU and RAM in them and can run VMs/containers/etc and be your own little cloud.

They also have some cool features like, the ability to link 2 remote Synology NASes together and sync back and forth as well as backing up to any backup service you want.

outworlder · on Sept 28, 2022

Not the person you are asking, but I got some leftover computer parts that I had been amassing for years, bought a few drives and added TrueNAS to it.

It can do anything the big consumer NAS can, and a lot more. If it fails, it uses off the shelf components.

The only thing that could have been done better is ECC memory (for some people that's a dealbreaker). In which case, one can get a server from ebay for a pittance which will fit the bill.

apitman · on Sept 28, 2022

Check out syncthing an rclone if you're not already aware of them.

Jeff_Brown · on Sept 28, 2022

I use Borg and Rclone together, to cover both aspects. One unexpectedly nice thing about that is a Borg repo consists of far fewer files than the thing it backs up, so Google is unlikely to rate limit you as described in the article.

proactivesvcs · on Sept 28, 2022

Agreed - using a sync service can help bring in data to be backed up, or make copies of a backup. The advantage you mentioned is one of many that we get from using a backup system for backups.

schainks · on Sept 28, 2022

Is there any documentation on how you did this? I'm very interested ^_^

TiredOfLife · on Sept 28, 2022

Syncthing (and probably other sync utilities) allow versioning and keeping deleted files in separate directory.

proactivesvcs · on Sept 28, 2022

Yup, which are great for noticing you deleted a file accidentally, before you went to lunch. But for anything more than very simple, recent restoration, versioning is not a backup. It (can/does) mangle file names, metadata, folder structure and can be a nightmare to perform larger restores with. Versions are a convenience.

zeagle · on Sept 28, 2022

Agree. It can be but you need to insert version control in there to protect from this. Either as a software package, iterative backups, or the file system.

layer8 · on Sept 28, 2022

It is if it supports history/versioning.

TOGoS · on Sept 28, 2022

Another solution: stuff everything that fits into Git repos, fetch --all from each of your different computers regularly.

For things that don't fit nicely into Git (e.g. large collections of large binary files), I use a homegrown git-like system for storing filesystem snapshots: http://github.com/TOGoS/ContentCouch

alphabettsy · on Sept 28, 2022

I prefer local-first because I want versioned backups and that seems harder with cloud-first.

I’ve had complete data loss but accidentally deleting something is more common.

Currently using both Backblaze and Arq Backup for Mac/Windows, Restic on Linux.

akerl_ · on Sept 28, 2022

I have something akin to this. For the vast majority of my data, a cloud platform is the authoritative storage: Gmail, Google Photos, GitHub, etc. I have a local NAS that does streaming backups of those systems to give me a persistent local copy, and then the NAS also backs up the totality of its local backups to an S3 bucket whose policy enforces deletion protection.

wahnfrieden · on Sept 28, 2022

Including overwrite protection? No admin access to those policies from the server?

akerl_ · on Sept 28, 2022

The NAS’s user has ListObject, PutObject, and DeleteObject. The bucket has versioning enabled, and DeleteObject doesn’t allow deleting prior versions. So the NAS can delete what’s immediately visible in the bucket, but it can’t permanently delete things.

The other way to set this up is to configure Object Lock on your S3 bucket: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object...

The upside of versioning over Object Lock, for my use case, is that the backup scripts can be very simple, because they don’t have to deal with what happens if they want to clean up a file but don’t have permissions to. They just do their thing, and I’m confident that old versions are retained. The downside of this approach is that my S3 usage will increase over time, because I’m retaining all old content. So eventually it’ll cost enough for me to decide to either switch to Object Lock or figure out a safe way to prune old content.

8fingerlouie · on Sept 29, 2022

I do something similar.

Even before the current EU Energy madness, the price of storing <=10TB in the cloud was less than the cost of a dual bay NAS with 2x10TB drives plus the power required to run it[1] for 5 years. Currently the power alone costs more than the 10TB cloud storage.

Considering that the cloud has way more resilience than anything i can reasonably cook up at home, the major threat in the cloud is loss of privacy and/or loss of access.

Loss of privacy can be somewhat countered using Cryptomator (https://cryptomator.org/) which end to end encrypts your data, and works as a regular drive backed by whatever cloud you chose. It's open source and free for desktop usage, and mobile versions costs a one time fee.

Loss of access can be countered by keeping data locally, so, 4-5 years ago i moved everything to the cloud, and instead sync data locally in realtime, from where i make a local versioned backup as well as a versioned remote backup to a different cloud than my primary cloud.

I do have a couple of bidirectional synchronizations going on, but the newly added "rclone bisync" (https://rclone.org/bisync/) works wonders there.

[1]: You can get around 10TB of cloud storage for €22/month. The price of a synology DS-220 2-bay NAS is around €550 (amazon.de), and a 10TB WD Red is €588 (WD Store on Amazon.de), so €1726 for the hardware. A DS-220 uses 14.96W by itself (idle), and a WD Red 10TB uses 2.8W idle, so an idle power consumption of 20.56 W, so around 180 kWh/year, and just over 900 kWh for 5 years. Assuming a €0.5/kWh electricity price, which is currently in the lower end (prices here are above €1/kWh). Adding it all up, you get €1726 + €450 = €2176, which means an average cost per month of €36 for hardware and power (over 5 years).

Eun · on Sept 28, 2022

For Google I just use Googles Takeout[1] on a regular basis.

[1]: https://takeout.google.com/settings/takeout

ComodoHacker · on Sept 28, 2022

But do you verify your backups?

ramses0 · on Sept 28, 2022

I’m in a slightly similar place to the author, but would love some other backup nerds to throw in some advice.

I’m aware of the traditional “requirements” for backup (eg: attribute/metadata, cross platform, restore testing, etc), but in modern usage, I’ve discovered a split (and lean towards “cloud first”).

It boils down to “active” files v. “passive” data. I found this when testing rclone around preserving executable bits and trying to back up “active” (`chmod +x`) git working directories. It was a mess as all shell scripts lost permissions, and especially frustrating because backing up Git working directories is like kindof useless, but also very useful for those times when something goes wrong.

However, when backing up my MP3 rips or family videos (or large video editing projects), I’m really just needing passive blobs, not bit+attribute duplicate restores, especially when the local computer is effectively a moderately thin client compared to an exact duplicate of all possible data that I’ve backed up to a cloud provider (eg: rsync.net in my case).

Also there are tiers of cost and accessibility that aren’t necessarily managed well with most modern backup software.

I’ve settled on rsync.net for bulk “warm” access, and syncthing with a few local raspberry pi’s, and all local computers/phones as kind of a “hot” replacement for “Documents” directory.

I had to do a moderately complete backup with “restic“ from one local HD to another on my desktop and that worked well for being able to mount snapshots and pull files out, but it all feels like a lot of overhead compared to what I’m really shooting for.

Further problems are the general unreliability of fuse/sshfs-mounted filesystems, as “slow” file access for some things is quite acceptable, but unreliable or hanging on individual file access gets real old real quick.

So it boils down towards the far away end of the backup “cone” being S3-glacier-ish (cheap-ish), and mostly blob data with no extended attributes needed. Mid-tier is rsync/warm, mostly blob-ish shared between multiple systems, along with some kinds of per-device system restore capabilities (or home dir restore capabilities), and near/hot is Syncthing, and a local NAS of everything if possible. Modulo local device/hardware management, and attempting to reduce any administrative overhead, especially for casual/non-primary users of the same system.

It doesn’t feel like nirvana, am I missing any gaps in thought here? Anyone else have good experiences or suggestions on something somewhat comprehensive between “do nothing, let the cloud sort it out” and “be prepared, have your own local and remote backup/transfer/restore processes”?

zeagle · on Sept 28, 2022

It's a good way of doing it with near unlimited bandwidth and if a machine or drive dies you just replace it with no data loss. I save everything important encrypted on my vps to seafile and my Nas pills back night backups back home. Software/family just see it as network drive.

nicoburns · on Sept 28, 2022

> I still want backups. But instead of "backing up files in the cloud", I back them up locally, by redownloading them to a local archival drive. This is upside-down of most people's backup strategy, but it really is quite nice once you get it set up.

Is it? I thought that was what everyone did.

outworlder · on Sept 28, 2022

'Everyone' normally syncs locally, then uploads to the cloud, as 'offsite' storage.

secabeen · on Sept 28, 2022

The most careful of my cloud-first users maintain local snapshotted NAS copies of their cloud data.

waynesonfire · on Sept 28, 2022

no, strive for 3-2-1. cloud is a part of that.