rclone is indeed magical. I recently migrated a customer's business from Box to Dropbox, using rsync.net as an intermediary to save my own bandwidth. rclone and rsync.net work very well together.
Note that the Google Photos API which rclone uses for gphotos sources does not allow downloading original quality images (only re-compressed versions), and strips EXIF. So I personally don't use it for backup purposes, instead I:
1) Sync my Google Drive to a local Synology NAS
2) Periodically request a Google Takeout, which dumps the entire contents of my Google account (including original photos) into Drive.
3) A nightly script unzips the takeout archives so they can be picked up by an incremental backup to Backblaze B2. The NAS also does local snapshots.
This has some nice properties:
- There are 3 distinct copies
- If a file is inadvertently deleted I can restore from either local snapshot or B2.
- If my google account is nuked I still have an archive of mail, photos, etc. Though that's limited to the frequency of my Takeout requests.
- Since the NAS is the single point for backups it's a natural place to put non-cloud files, or Dropbox etc and have them automatically picked up for backup without setting up anything new.
Sadly, photos takeout breaks if you have too many files. Then you have to setup multiple partial takeouts (each with some subset of albums) which is a huge pain.
I used to use OneDrive as a 2nd backup but Microsoft started doing same crap (removal of EXIF).
One thing I'd add is make sure you're snapshotting via ZFS or something similar, because the most likely cause of dataloss is accidental deletion, and if your synced copies are perfect replicas, they'll replicate the deletion, too.
If I could speak directly to the op I would recommend flipping back to a "rightside up" model and doing dumb, mirror, 1:1 backups to rsync.net and then configuring a day/week/month schedule of ZFS snapshots on that end.
Those ZFS snapshots are immutable / read-only so they not only serve as retention, they protect against Mallory (and ransomware and malware, etc.)
Synchronisation is not backup. Almost every restore that I've performed for myself and others resulted from accidental deletion. Sync propogates accidental deletion. It also doesn't make your data resilient to malware, bit rot or removal of data for TOS violation.
> RAID is a waste of your goddamned time and money.
Backups are important, since they protect against most problems, whereas RAID only protects you from exactly one thing:
On RAID, when one disk has a hardware failure, the system keeps on running. In many cases, if you take appropriate precautions, you might even be spared a reboot when you unplug the failed disk and plug in a new, working disk. Otherwise you’ll simply have to shut down, quickly unplug the failed disk, plug in the new, working disk, start up and it’s all good again. You can even put off this downtime to a later, more appropriate time of the day. When the new disk is in the machine, the sync to the new disk can be done while the system is running, and the only system downtime is for physically exchanging the disks.
Without RAID, the hardware failure means that you:
1. Lose all files and changes since your last backup, and
2. Have significant downtime while you restore from your last backup. If done over the network, this could take a very long time, but even if restoring from local disk, copying takes a lot of time. While copying, your system is down.
Whether this means that RAID is a waste of your time and money depends on your personal tolerance of points 1 and 2.
This is what I do. I sync to my NAS, my NAS syncs to my Backblaze B2 with file versioning. My current monthly bill is only $4... I cannot believe how cheap Backblaze has been.
I can even get the B2 URL and share a file anytime I want from inside my NAS.
Is it as easy as Dropbox? No, but I haven't missed using dropbox. I just mount my NAS to my RDP sessions when I need files "synced" between my various computers.
I've been eyeballing this kind of setup lately, because Dropbox has become so hostile to simple use. Hours of delay between "changed small text file" and "synced", endless "bugs" that push things to cloud-first storage rather than the synced folders I have set up, and using more and more resources with every upgrade. I'm so sick of their business-oriented shift.
</rant>
Got a favorite NAS that works well? Without the risk of the vendor's cloud deciding to delete your data? I haven't yet picked one, beyond "what if I just used a Pi and some USB drives..."
I use Synology. Its not the best, but it Just Works and got the "Shell" for only $120, plopped in 2 8TB spinners, and haven't really touched it other than moving house 3 times.
If I was going to do it over again, I would get one of their commercial grade NASes/ They are a bit more expensive but have a decent CPU and RAM in them and can run VMs/containers/etc and be your own little cloud.
They also have some cool features like, the ability to link 2 remote Synology NASes together and sync back and forth as well as backing up to any backup service you want.
Not the person you are asking, but I got some leftover computer parts that I had been amassing for years, bought a few drives and added TrueNAS to it.
It can do anything the big consumer NAS can, and a lot more. If it fails, it uses off the shelf components.
The only thing that could have been done better is ECC memory (for some people that's a dealbreaker). In which case, one can get a server from ebay for a pittance which will fit the bill.
I use Borg and Rclone together, to cover both aspects. One unexpectedly nice thing about that is a Borg repo consists of far fewer files than the thing it backs up, so Google is unlikely to rate limit you as described in the article.
Agreed - using a sync service can help bring in data to be backed up, or make copies of a backup. The advantage you mentioned is one of many that we get from using a backup system for backups.
Yup, which are great for noticing you deleted a file accidentally, before you went to lunch. But for anything more than very simple, recent restoration, versioning is not a backup. It (can/does) mangle file names, metadata, folder structure and can be a nightmare to perform larger restores with. Versions are a convenience.
Agree. It can be but you need to insert version control in there to protect from this. Either as a software package, iterative backups, or the file system.
Another solution: stuff everything that fits into Git repos, fetch --all from each of your different computers regularly.
For things that don't fit nicely into Git (e.g. large collections of large binary files), I use a homegrown git-like system for storing filesystem snapshots: http://github.com/TOGoS/ContentCouch
I have something akin to this. For the vast majority of my data, a cloud platform is the authoritative storage: Gmail, Google Photos, GitHub, etc. I have a local NAS that does streaming backups of those systems to give me a persistent local copy, and then the NAS also backs up the totality of its local backups to an S3 bucket whose policy enforces deletion protection.
The NAS’s user has ListObject, PutObject, and DeleteObject. The bucket has versioning enabled, and DeleteObject doesn’t allow deleting prior versions. So the NAS can delete what’s immediately visible in the bucket, but it can’t permanently delete things.
The upside of versioning over Object Lock, for my use case, is that the backup scripts can be very simple, because they don’t have to deal with what happens if they want to clean up a file but don’t have permissions to. They just do their thing, and I’m confident that old versions are retained. The downside of this approach is that my S3 usage will increase over time, because I’m retaining all old content. So eventually it’ll cost enough for me to decide to either switch to Object Lock or figure out a safe way to prune old content.
Even before the current EU Energy madness, the price of storing <=10TB in the cloud was less than the cost of a dual bay NAS with 2x10TB drives plus the power required to run it[1] for 5 years. Currently the power alone costs more than the 10TB cloud storage.
Considering that the cloud has way more resilience than anything i can reasonably cook up at home, the major threat in the cloud is loss of privacy and/or loss of access.
Loss of privacy can be somewhat countered using Cryptomator (https://cryptomator.org/) which end to end encrypts your data, and works as a regular drive backed by whatever cloud you chose. It's open source and free for desktop usage, and mobile versions costs a one time fee.
Loss of access can be countered by keeping data locally, so, 4-5 years ago i moved everything to the cloud, and instead sync data locally in realtime, from where i make a local versioned backup as well as a versioned remote backup to a different cloud than my primary cloud.
I do have a couple of bidirectional synchronizations going on, but the newly added "rclone bisync" (https://rclone.org/bisync/) works wonders there.
[1]: You can get around 10TB of cloud storage for €22/month.
The price of a synology DS-220 2-bay NAS is around €550 (amazon.de), and a 10TB WD Red is €588 (WD Store on Amazon.de), so €1726 for the hardware. A DS-220 uses 14.96W by itself (idle), and a WD Red 10TB uses 2.8W idle, so an idle power consumption of 20.56 W, so around 180 kWh/year, and just over 900 kWh for 5 years. Assuming a €0.5/kWh electricity price, which is currently in the lower end (prices here are above €1/kWh). Adding it all up, you get €1726 + €450 = €2176, which means an average cost per month of €36 for hardware and power (over 5 years).
I’m in a slightly similar place to the author, but would love some other backup nerds to throw in some advice.
I’m aware of the traditional “requirements” for backup (eg: attribute/metadata, cross platform, restore testing, etc), but in modern usage, I’ve discovered a split (and lean towards “cloud first”).
It boils down to “active” files v. “passive” data. I found this when testing rclone around preserving executable bits and trying to back up “active” (`chmod +x`) git working directories. It was a mess as all shell scripts lost permissions, and especially frustrating because backing up Git working directories is like kindof useless, but also very useful for those times when something goes wrong.
However, when backing up my MP3 rips or family videos (or large video editing projects), I’m really just needing passive blobs, not bit+attribute duplicate restores, especially when the local computer is effectively a moderately thin client compared to an exact duplicate of all possible data that I’ve backed up to a cloud provider (eg: rsync.net in my case).
Also there are tiers of cost and accessibility that aren’t necessarily managed well with most modern backup software.
I’ve settled on rsync.net for bulk “warm” access, and syncthing with a few local raspberry pi’s, and all local computers/phones as kind of a “hot” replacement for “Documents” directory.
I had to do a moderately complete backup with “restic“ from one local HD to another on my desktop and that worked well for being able to mount snapshots and pull files out, but it all feels like a lot of overhead compared to what I’m really shooting for.
Further problems are the general unreliability of fuse/sshfs-mounted filesystems, as “slow” file access for some things is quite acceptable, but unreliable or hanging on individual file access gets real old real quick.
So it boils down towards the far away end of the backup “cone” being S3-glacier-ish (cheap-ish), and mostly blob data with no extended attributes needed. Mid-tier is rsync/warm, mostly blob-ish shared between multiple systems, along with some kinds of per-device system restore capabilities (or home dir restore capabilities), and near/hot is Syncthing, and a local NAS of everything if possible. Modulo local device/hardware management, and attempting to reduce any administrative overhead, especially for casual/non-primary users of the same system.
It doesn’t feel like nirvana, am I missing any gaps in thought here? Anyone else have good experiences or suggestions on something somewhat comprehensive between “do nothing, let the cloud sort it out” and “be prepared, have your own local and remote backup/transfer/restore processes”?
It's a good way of doing it with near unlimited bandwidth and if a machine or drive dies you just replace it with no data loss. I save everything important encrypted on my vps to seafile and my Nas pills back night backups back home. Software/family just see it as network drive.
> I still want backups. But instead of "backing up files in the cloud", I back them up locally, by redownloading them to a local archival drive. This is upside-down of most people's backup strategy, but it really is quite nice once you get it set up.
https://rsync.net/resources/howto/rclone.html
The example is S3 <--> rsync.net (naturally) but the step by step instructions are applicable to setting up any combination of "remotes".
rclone is like youtube-dl - a powerful tool that seems almost magical.