Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Kopia – Fast and Secure Open-Source Backup (kopia.io)
176 points by chetangoti on June 11, 2021 | hide | past | favorite | 76 comments


Another one? Ten years ago I had trouble finding anything, then over time I learned of duplicity (2002?!¹), bup (2010), restic (2015), borg (2010)... all basically solving the same problem of encrypted incremental backups.

The landing page doesn't mention why they made yet another solution. In the comments someone also mentions bvckup2. Is there an overview somewhere of all the different solutions? Any selling points here?

¹ That's way older than I expected, but it's also the first one I found and I stopped using it because it took many gigabytes on my then-250GB SSD for a local cache just to be able to do incremental backups. Maybe that's why I had the feeling I couldn't find anything good at the time.


If you think it's just adding up similar tools, you're pretty wrong.

I've only used kopia briefly and it had a bug where I couldn't simply read files from its GUI, so I'd take it it's still doing basic implementations but all the tools you mentioned have varying degree of reliability and performance as well.

You can easily Google around and see how restic takes huge memory and is slow to restore on larger repo, duplicity being unreliable and you'll find less of those reports for borg which I believe is the only open source tool of the kind that is reliable and doesn't stall on restoration.

I want to use multiple implementations for doing a backup towards multiple remote targets in case one implementation has a corruption bug but so far my research only shows it's better to just use borg twice to different remote targets than use another implementation that could be unreliable.

duplicacy also takes lots of memory to the point of freezing up a machine with 2GB+ free ram while taking medium sized repo that is only in the range of 50GB with many small files. (Not to mention I don't like its per machine license which can be costly if you handle many small servers.)

Backup tools are something that is so sensitive about its reliability. It must be working from the beginning as it's too late when your backup data was corrupt but you only realize that when your original data is already inaccessible as not many people do backup data integrity check frequently.

Reading on its feature list doesn't mean much but do yourself a favor and check for GitHub issues, check the forum for bug reports and maybe even Reddit and figure if they're fighting primitive bugs or not.

You should rely on a solid tool than just one of the remote backup tool, like using rsnapshot locally first. It uses rsync and hard links, which is quite hard to break.

Here are a few reports on instability for a few tools.

1. https://forum.rclone.org/t/rclone-as-destination-for-borgbac...

2. https://www.reddit.com/r/unRAID/comments/eg0zpe/duplicati_se...

3. https://forum.duplicati.com/t/is-duplicati-2-ready-for-produ...


It has a gui and works on Linux/Windows/MacOS. Also has both deduplication and compression.


Thanks!

Still a bit strange to me that they started a whole new project rather than contribute patches or fork something else.

The only one I'm fairly familiar with is restic. It also compiles for the major OSes and has deduplication. For compression, I think there are patches available, but alternatively they could just have contributed one. That leaves tying the command-line interface to a few buttons in a GUI.

Edit: Kopia turns out to be from 2016. I guess the author didn't know of the others yet, or the others weren't as mature. This makes a lot more sense, somehow the .io domain and my first hearing of it only now made me expect this was written recently.


Restic issues for adding GUI and compression are 7 years old. If if there was interest/"easy to do" for implementing them that would have been done.


What I meant is that the author of Kopia could have done that and made their life a lot easier compared to starting all the way from scratch. But I posted that before realizing that Kopia is barely a year younger than Restic.

But of course it could also just be what u/poronski said in a sibling comment. His Noodly holiness knows I make a lot of software that already exists just because I enjoy the making and having it customized. In fact I think I also started... let me `stat` that directory... yup, in 2016 I started working on (and abandoned) my own implementation of encrypted backups. Also because online storage prices were through the roof (~40x the hardware cost price with servers and bandwidth included) and I thought I could do that cheaper.


To each their own.

For a lot of people it’s way more fun to make a new thing than to fork and patch someone else’s.


> Is there an overview somewhere of all the different solutions?

https://github.com/restic/others


Which backup solution are you using now?


Easy on the gas, bud.


I have been using Kopia for some time now after switching from Duplicity. Very happy with Kopia. You can just point Kopia at a GCS or S3 bucket and shove files there. Easy to restore files. You can list snapshots and files and do partial restores fairly easily. Old data gets expired on a timeline that you dictate.

Duplicity was a pain by comparison. I think Duplicity has a number of design flaws that become evident once you use it for a while.


The only thing I need (and is sorely missing from Restic) is that the metadata be kept separate from the actual data. That way I can store the data in AWS S3 Deep Glacier at a cost of nothing per year, and still do incremental backups. Currently the architecture of Restic for instance requires all data to be quickly and cheaply accessible; which makes it impossible for this.

I have terabytes of data that I'd be happy to dump encrypted and compressed in Deep Glacier and happy to pay $500 to retrieve if I were to mess up my hard drives, but otherwise don't want to pay for the costs of normal S3.

Does Kopia separate metadata from the actual encrypted/compressed blobs?


Hi, Kopia author here.

Yes, Kopia does segregate metadata from data. Directory listings, snapshot manifests and few other minor data pieces are stored in pack files starting with "q" while the bulk of data is stored in pack files starting with "p". Indexes into both kinds of pack files are stored in filenames starting with "n".

This has some interesting properties. For example we can easily cache all metadata files locally, which provides fast directory listing, manifest listing and very fast verification without having to download actual data.


Thanks! I'll have to investigate Kopia properly.


Would you mind sharing how much you pay for S3? I assume you’ve considered options like Backblaze B2 and Wasabi?


Google Workspace is still around $10/month for unlimited. 50TB here and counting, uploaded a few more TBs just this week. Incredible value proposition! Only some limits on traffic, 750GB/day ingress. Works very well with rclone.

Legally that price is supposed to be for 1TB storage but the quota is still not being technically enforced, at least for me with a grandfathered gsuite account. Not sure about new accounts, it seems something may have changed couple of weeks ago with new ToS.


Wow... this is insane, nothing can compete with that, I'm pretty sure! 750GB/day ingress is huge.

Surely they've closed this "loophole" on new accounts.


It's been an open secret for many years that they don't enforce quotas. It's not profitable but google has really deep pockets and can afford to not care. Not sure if they really closed it this time - like I said, I'm still able to upload multiple extra terabytes onto my supposedly 1 TB account even now. So it doesn't seem like they started enforcing quotas to me.

Look/ask around in r/DataHoarder for recent experiences of other people, they also discuss other storage services in general a lot.


Oh yeah I know, I just didn't think that people wre doing that much with it! I definitely lurk r/datahoarder, absolutely love seeing people that are so excited about storage (and haven't been found by mainstream reddit for the most part yet). r/zfs is also pretty good for nerdy drive stuff from time to time.


Suppose I have ~4 TB of data. If I dump it into Deep Glacier, it'll be ~$50/year (free ingress), if I ever need to retrieve the data, it's like $370.

Normal S3 would be ~$1.1k/yr, or around half of that for the infrequent access tier, both of which are way too expensive.


Thanks for sharing! This is why I asked... Wasabi prices 4TB of data, with 100% of it downloaded every month @ $287/year according to their price calculator.

Backblaze B2's calculator is a little more sophisticated, and putting in the numbers for an absolutely pathological usecase where you start with 4TB, download, delete and upload that same amount every month puts you at $720/year. A much less pathological use case (I think) that assumes you upload, delete and download 1TB/month puts out around $360/year.

Hetzner storage boxes offer 10TB for ~$48/month, which is $576 a year -- free ingress/egress, no hidden fees for operations or whatever else, but you do have to set up a node with minio (or use FTP, etc).

Amortized over the happy time (time where you don't need to rely on your backups) this does make sense, but I wonder what the percentages look like on that kind of metric. To be fair I haven't had to restore from backup for years so this probably makes a lot of sense. I guess there's no need to test your backups/restore either if you're using a tool like borg/restic/etc and have tested it with local hardware.

Also, what happens if you have to retrieve data twice from Glacier? You've got access to it for 24 hours so I assume you're planning on just keeping the data on some non-glacier storage medium for a while after the initial post-disaster pull?


This wouldn't be the primary backup, but Deep Glacier is just such a good deal that: I'd be happy to pay the $50 per year for a call option on my data, it'd make me sleep better at night!

Part of my calculus is that I have quite strong confidence in AWS in terms of business continuity and reliability/availability. If I dump my files on AWS, I have high confidence in the files (and AWS) being around in 10 years and retrievable for roughly the same price (or at least no more).

Hetzner would have much lower durability. I'm a bit suss on Backblaze, though I do trust them to be more durable than my self-managed disks (and uncorrelated to my failures). I don't know much about Wasabi; but it's not a good sign for me that their landing page touts their latest funding round at the top: seems young and you never know if the price is subsidized with VC money (and won't be in n years) or similar.

> Also, what happens if you have to retrieve data twice from Glacier?

The killer is the egress. I'd just buy a new set of disks and download it straight there.


I suppose you can't check for backup data integrity inside Glacier.


S3 Glacier Deep Archive is $1/TB/month. Super cheap storage costs, but retrieval costs are insane.


My question was more about just how many TBs and ingress/egress was making AWS S3 cost prohibitive -- Wasabi's sticker price is $5.99/month (so 6x glacier but ~0.2x regular S3), and I know that Hetzner will give you a storage box that is 1TB for 9.40EUR (but the kicker there is that 10TB of traffic is included which is amazing), and there are no API/operation fees when you run your own Minio (or just use FTP/all the other built-in access methods).

Network is one thing but what am I missing here? Maybe I just think $10/month is reasonable for 1TB (because I don't have enough TBs? or use remote storage enough?), and that's different from most people who are interested in this.


I love the new tendency to use Polish (and other languages') names for programs!


The other day I heard someone use the word "żuk" referring to a bug, and I think it fits amazingly well.


Is it pronounced like Russian “Zhuk”?


Yes, exactly.



Indeed! I'm from Sweden, where 'kopia' is Swedish for the English word 'copy'. Same meaning in Polish?


It means both "copy" and "lance" (as in: weapon used for jousting), hence the logo.


I wonder how it compares to restic or borg. Besides the gui anyway…


Restic doesn't support compressing backups, and kopia does. Otherwise, the architectures appear to be very similar.


How much space do you typically save by compressing these days? Given that even smallish things like documents are already compressed archives, pictures/audio/movies of course already have heavy purpose-specific compression.

The main things I can still think of that are sparse on purpose are database files and disk images (not very mainstream, but also not uncommon). So like, a few gigabytes per terabyte (a few promille) unless you're really heavy on either databases or virtual machines?

I can see why one would like to enable it, but deduplication (which breaks if you naively implement compression, iirc that's why restic hasn't yet implemented it) is much more worth it because it enables incremental backups and you don't, for example, have to worry about making a copy of another system that has many of the same files (think game files or system files).


I assume compression works well for source code and other developer artifacts. Obviously you have to do dedupe, then compression, then encryption.


Source code compresses very well indeed, but my hunch is that it's peanuts. Let's see, I've got a projects directory with various projects from the past decade (all custom, there's a separate dir for downloaded repositories). I've mostly written things in Python and PHP (the JS/CSS/HTML stuff is on a server mixed with things like owncloud or SMF or so; harder to isolate).

PHP: 591 KiB, 196 files, 13'813 LOC, 1'236 comment lines.

Python: 345 KiB, 136 files, 7'760 LOC, 930 comment lines.

If someone spends 5 minutes of developer time trying to compress that to save disk space, that's already not worth it. Also in huge projects, the actual code is not going to be taking gigabytes of space. And if you mean in git history: that is, again, already compressed.

Other developer artifacts: I've got 26 GiB of project directories, this time including downloaded software and it will also include binaries (hashcat and jtr are in there, I wouldn't be surprised if there's also a medium-sized dictionary or two). Doing tar c . does not seem to add much overhead (26.5 GiB). Compressing that stream with pigz -1 (multithreaded gzip) brings it down to 17 GiB.

35% off is better than I thought! I wonder which files compress so well, hmm let me `find -type f | shuf | head -9001 | while read line; do echo "$(($(wc -c <"$line")-$(<"$line" pigz -1 | wc -c))) $line"; done | sort -n`... The largest difference is a huge 121 MiB binary that compresses down to 36 MiB. I didn't know these files were so sparse (not a C(++) dev), interesting!

While I'm looking into this, let's also look at my "documents" directory. It's 43 GiB and compresses down to 33 GiB. Not as good, but still worth it, more than I thought! (And this compression isn't good, but probably not more than 10% worse before the compression gets impractically slow.) It might not quite get a total backup size down by a disk size (e.g. not 1T down to 500G), but it definitely allows to keep more history before having to worry about what you want to keep and what you want to toss.


For local backups, compression is less of an issue; for things that are compressible, transparent file system compression seems to get about 110% the size of what I would get by any non-CPU bound levels of compression using a tgz. Since (as others in this thread have noted), compressible files tend to also be smaller files (the only exception I can think of would be if your log rotation doesn't compress old logs), the fact that only a fraction of what I backup is 10% larger is kind of "okay." When you're sending across the network though it can be a big deal.


On disk the backups are encrypted, that means no transparent compression.


For people who haven't dealt with this - a good encryption scheme produces output which you can't tell apart from a purely random stream of bits - it has very high entropy, and is therefore not compressible.


Thanks for clarifying this. That must mean then: first encrypt, then compress?


Compress than encrypt.


And this has some pitfalls as well, ciphertext in general has about the same length as the plaintext, so compression rate can be used to infer some information about the plaintext. It's more of a problem with interactive protocols than with backups, but still worth keeping in mind.


You can see some of the performance differences here - https://blog.kasten.io/benchmarking-kopia-architecture-scale...


Note this compares an older restic version that doesn't include the order of magnitude improvements in cloud communication.


Looks like a polished restic to me.


Polished*


A hard joke to get, but I liked it.


Gotta say kopia does have a very strong restic vibe to it.

Not a bad thing, just means that restic managed to get lots of things right.


It has a gui and works on Linux/Windows/MacOS. Also has both deduplication and compression.


Finally! I've been waiting patiently for a open source x-platform solution that ticks those boxes. edit: ah, nevermind. I just tried it and I'm sure it's great for online backups. But not so well suited for backups on a plain usb hdd.


Pretty sure it allows and works with local USB storage as well, covered in documentation here: https://kopia.io/docs/repositories/#local-storage.


Any discussion about backup should always include a shout out to one of the most beautiful versions of this: bvckup2

About 10 years ago the founder just decided to start writing the most streamlined, beautiful pieces of software he could - obsession around NTFS nuances to improve performance and reduce overhead to an unbelievable degree.

10 years later, it's still him (and maybe a few other folks) and the software is unbelievably polished.


I believe you about polish and performance, but that solution is neither open-source nor cross-platform (unless I failed to find it on the website).


Nor free. The download button says a 2-week trial is included.

Which is okay, but licensing is enough of a hassle when restic etc. exist that I'm not going to bother with that for my systems. A design goal of restic (sorry I'm just familiar with that one, not affiliated with it) is also recoverability: if your repository gets horribly corrupted or you can't run the software easily anymore then the author wanted to be able to recover things still. One of the early talks (at a local CCC) explains how to decrypt things manually in a few minutes -- obviously he knows what he's doing and will be faster than me, but still. Having closed source software as an alternative to that... I dunno.


Yeah, I’m not complaining and it’s entirely fair to sell your work, but I’m always wary of relying on a closed-source (+ licensed) tool for things as critical as backups. I may be partial though, having interacted with the Borg author in-person a few times at Congress, which convinced me I could trust it to not shred my data.



On Mac the most polished is Bombich's Carbon Copy Cloner, it's a thing of beauty. I might go back to Mac just for that piece of software alone.


I’m curious if this or other tools can help me with the following use case: I have a Synology network attached storage.

What I want is a way for me to incrementally back up to Synology when I’m on my local network/VPN, while Synology maintains an encrypted cloud backup of my stuff. Alternatively when I’m outside of my network travelling, my laptop is backing up directly to the cloud with a way to sync that up with Synology.

In other words, I want to have a synchronized local and offline backup that works seamlessly whether I’m on my network or outside of it.

Is that a reasonable expectation? Does a tool like that exist?


You can setup Zerotier on your NAS and your computer (and Phone, etc) and things will always stay connected via an internal ip no matter where you are. I have been using this setup for several years. Best part is it works with any software running on the computer or Dockers, etc on the NAS.

I bought a second (cheap,slow) NAS and installed it off-site (away from home) and I keep a full off-site mirror of my NAS.

Use any tools you want, syncthing, rsync, smb, NFS... It's all just internal IPs routed via ZeroTier.


ZeroTier is incredibly good. I have a Synology too and without ZT it would be a lot less useful.


As a trivia, kopia means copy in Polish which is founder's mother tongue.


Also in Swedish (so possibly also in other Nordics).


Interesting, I'd be willing to give it a try after I had to abandon Duplicati as it just seemed like a lost cause. Right now my backup setup consists on a UrBackup server my machines connect to, and a borg repository that is synced against the latest backup and then sent remotely to S3. It works, but could definitively use some streamlining...


Can I get some recent performance benchmarks vs restic? I only recently switched to restic and for the most part it is great (definitely an improvement from CrashPlan IMO), but it seems like this might actually have a few improvements to make it worth considering switching from restic.


I ran some performance benchmarks not too long ago. https://blog.kasten.io/benchmarking-kopia-architecture-scale...


Thanks.

It would have been interesting with restore times as well.

Quite important if it will take you an hour or a day to restore..

L


How does this compare to Duplicacy in terms of throughput?


This looks really cool, and seems to tick all the boxes for me:

* Multi-platform

* Encrypted

* Compressed

* Deduplication

* Supports S3 Compatible + Backblaze B2

* Performant even with millions of small files

* Supports mounting

* Good usability

Wishlist:

* Automatic exclusion from .gitignore files

* Support nested includes/excludes


Looks interesting. I always liked the borg concept, but never actually used it for backups because it lacks windows support and a nice gui.

Will give it a try.


Borg does have a GUI: https://vorta.borgbase.com/

Although it's subjective if you find it nice or not.


It doesn't have native Windows support but it works pretty well in Cygwin. I attempted to create a minimal Windows/Cygwin setup for it to run in but admittedly it's in a bit of decay at the current moment https://github.com/nijave/borg-windows-package


Has anyone had experience with Kopia and restic?

If so any opinions?


Use tarsnap instead: https://www.tarsnap.com/


how would this type of software compare against Dropbox / NextCloud?


They're fundamentally different in the problems they solve. Dropbox is a cloud file storage and syncing service. NextCloud is an open source alternative to something like Dropbox in that it offers file sharing and syncing but also much more on top of that. It's really closer to something like the Google suite of personal cloud services with Google Drive, Photos, Contacts, and Calendar. Kopia is a backup solution for the files on your computer. You can use cloud file storage providers as the destination for these backups but it doesn't handle the storage of the backups itself. You have to provide that storage to it.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: