Just a reminder, as this is sometimes confusing - a plain old rsync.net account is running on a ZFS filesystem and has all of the benefits that that entails (point in time snapshots, checksummed integrity, CoW efficiency, etc.) but they do not have their own zpool to themselves.
If you want to zfs send to an rsync.net account you need to have an account enabled for that which is the same price, but has a higher (4TB) minimum account size.
You've discussed why a zpool account has a relatively high minimum in the past [1]. Basically because you need a persistent isolated vm/bhyve with sshd listening on its own ip and separate zfs and zpool for each customer.
It's still fuzzy to me specifically why a dedicated 'vm' is necessary. You don't need a dedicated ssh ip for normal 'shell' accounts, and certainly you still provide security and privacy for these customers. Is it because zfs doesn't have fine-grained permissions to allow doing zfs send/recv actions on one's own datasets/volumes without also giving access to other customers' data? Or is it because send/recv workloads just consume more compute resources in general?
I myself am still fuzzy as to why we need a VM but it is, indeed, due to the permissions that ZFS requires to do a 'zfs destroy' ... which you need in order to manage your own snapshots.
So, basically, you need to be root to fully manage your own zpool ... and if you need to be root, you need to be in a VM. In our case, we use bhyve and we have had very good success.
We have this on very good authority - Allan Jude of Klara Systems has helped us audit the entire setup and there is not, unfortunately, a lighter way to do it.
Thank you for clarifying! From zfs-allow.8 docs Example 2 [1] it looks like it should be possible, but I guess there are important details omitted from that document that make such a design nonviable in your case. I'm glad to hear that you are using system designs vetted by zfs experts.
Well, if you're being smart, it's going to be write-once-read-sometimes-once because you should be testing your backups every so often. You'd probably want to backup to some kind of service in the middle, then at the end of the month/quarter ship a backup to cold storage.
"4TB? That's $60 per month. Mmkay. Just use duplicati + cloud cold storage. Way cheaper."
This is random access, live storage - which is not the case with cold storage options such as Glacier.
A more fitting comparison would be S3 ...
I think you are, in many cases, correct - offsite backups can, indeed, be write-once ... but if you're looking for the very specific efficiencies and features that zfs-send affords you, that will not be the case.
That's because their zfs account requires a dedicated IP and a VM.
It used to be 1TB minimum until pretty recently but they went with 4TB now.
If you want a less space solution, just spin up a t3.small (I suppose at least 2GB memory would be better for zfs) on AWS as Ubuntu with Cold HDD (minimum 125GB) as extra storage which is quite cheap at about 0.015/GB and install "zfsutils-linux" and you're good to go to use zfs.
I just set up zrepl yesterday and this was going to be my next task. Super glad to find this here. I already have a borg-enabled account with you: Is it possible for a single account to do both borg and zfs send?
To be clear, rsync.net zfs should work with any/all shell or script-based ZFS replication tools. The basic documentation for zrepl here is good, but you can also use sanoid/syncoid, zsnapzend, etc.
I haven't personally used any tool besides zrepl, but I like the flexibility it offers. It can do both push and pull-based replication. I have my offsite target server (with its own retention policy) pull from the source server. If the source ever gets ransomwared, it can't encrypt/delete all the existing snapshots on the target.
And another huge benefit for my setup is the ability to use an HTTPS connection for the replication instead of SSH. My source and target servers are in different continents so there's pretty big latency. zrepl's HTTPS server manages to transfer data at ~890Mbps while SSH doesn't deal with the high latency as well and only manages ~190Mbps.
I'm curious how folks who use this deal with that first upload on a “normal” ISP connection.
I tried rsync for this purpose in the past. Our main office gets 150 Mbps down / 20 up. I tried uploading the initial 1TB snapshot and after a week it had not finished. Meanwhile a fair amount of that original snapshot becomes stale during that time.
Are you just supposed to start up these hourly snapshots and hope everything catches up with itself eventually?
Not sure about zrepl, but sanoid/syncoid will keep a separate snapshot for each replication target that has a lifetime separate from your usual expiration policy. So say you setup sanoid to keep 24 "hourly" snapshots, but the initial replication w/ syncoid takes 36 hours. You'd be left with a "@syncoid_HOSTNAME_ISO8601" snapshot, a 12 hour gap, followed by your 24 hourly snapshots. So that snapshot will hold that 12 hours worth of block churn, to allow for incremental sends, until your ISP is able to catch up.
The other option, if you're colocating, is to send a seed drive ahead to the datacenter. (I think there was a startup on here a while back where they'd basically colo your drives in their own JBODs, and then charge you a nominal monthly fee for a VPS w/ those drives passed through as a zpool.) You might pay some nominal fee for remote hands, but it beats waiting for terabytes of data to squeeze through your local cableco's wildly asymmetric pipe.
Back in the day I managed a small office server, backing up via a particularly slow ADSL line. It took about a week to do the initial sync, so I just let it run and in the meantime backed up to an external disk every couple of days.
Once it was up and running most snapshots took a few minutes to sync, always finished before the morning anyway.
Definite +1 to rsync.net, this was >15 years ago but it was always 100% solid, I don't think I ever had any issues. It's nice to see they're still doing the same thing and haven't bloated it with crap!
zrepl is awesome!! I have been using it for some time with excellent results to manage a lot (for my org anyways) of backups. Glad to hear rsync is making it available
If you want to zfs send to an rsync.net account you need to have an account enabled for that which is the same price, but has a higher (4TB) minimum account size.