Hacker News new | past | comments | ask | show | jobs | submit login
Imap-backup: Backup Gmail or other IMAP accounts to disk (github.com/joeyates)
214 points by miles on Jan 8, 2022 | hide | past | favorite | 79 comments



Imapsync at https://imapsync.lamiral.info/ is an updated CLI tool that effectively syncs big or small IMAP boxes. More for moving han backup, it can serve you well since you can interrupt the process and sync huge boxes without having to pray for the entire box to copy at once. I've used it for years as a migration tool and keeps being updated.


I've been using this to sync to a local dovecot installation. It's the best solution I could find that would allow me to have a local backup in a standard format (ie. I can see the individual email messages in a Maildir) and also be able to browse/search the backup using a mail client.

It's not that clear from the website but you should find imapsync in your distribution's package repo if you want to try it before supporting the author. I don't doubt the "no questions asked" refund policy, but I'm done fighting for my money back after bad experiences in the app stores.


You can also find the 'GitHub version' (which I'd guess is where distributions get their versions from) at https://github.com/imapsync/imapsync . I used this for an uncomplicated transfer from one IMAP server to another; it felt pretty slow but with the right CLI flags it is interruptible/restartable, and I didn't end up with lost or duplicate mail.

It comes with the author's custom "no limits" licence (looks a lot like the WTFPL), so it probably qualifies as free & open source, but you don't get a support contract of course.

The author's site also has a "give me your IMAP credentials and I'll migrate your mailbox for you, free up to a size limit" service, https://imapsync.lamiral.info/X/ , which boggles the mind. On principle I couldn't trust it.


Did you do anything particular config-wise for accessing your local instance of dovecot? I tried with [0] but the connection times out when trying to connect via Thunderbird…

[0]: https://github.com/antespi/docker-imap-devel


That project looks like it uses SSL/TLS so if your certificate is self signed you'll need to add an exception [1]

Thunderbird doesn't prompt for this while connecting like a web browser does. It fails to connect without a clear warning in this case.

Full shameful disclosure: Since my backup is only locally accessible on a trusted network I used insecure IMAP on port 143

[1]: https://serverfault.com/questions/532172/thunderbird-not-tru...


GitHub link: https://github.com/imapsync/imapsync (it is free software)


Been using this daily. It's much better.


If you do such a gmail backup, I wrote a cross platform Desktop app that can analyze these email backups to provide a visual clustering of the contents of your mails.

https://github.com/terhechte/postsack

It parses 500k mails in < 1 Minute, so it is quite fast. There's a web / wasm build of the UI here: https://terhech.de/web_demo/


I've been using Netviel (a web based not much client) to do this currently and it's annoying to have to convert my mbox files to Maildir to get it working. Thanks for posting!


This is great. Thanks!


Google Takeout is fine for me - it just gives you an .mbox file with literally everything and it can be imported into Thunderbird or Outlook.


Solves a different problem. You can't trivially automate takeout. What if I just want an offline record of my emails for the day that google decides to delete me from the internet?


> What if I just want an offline record of my emails for the day that google decides to delete me from the internet?

Well this is what just a normal email client does - it downloads latest emails via IMAP.


What if I don't want an email client? I just want a headless program that downloads my emails. They can be kept on a NAS that is backed up offsite and keeps the emails searchable. I want this to happen even if I don't happen to open an email client on a desktop at my home for months.

That's why imap-backup is useful.


> I just want a headless program that downloads my emails.

Lol that’s an email client! Not all email clients have a manual UI.


I also have my client make a copy in local folders, and then Backblaze scoops both up. It's a little redundant, but it's not that much space use and it gives me something to compare if I ever have reason to worry an email was tampered with on the server and downloaded over the original. More concern toward a hack than government/host meddling.


isync (aka mbsync) works great for this.


Mailpile.is


What if I just want an offline record of my emails for the day that google decides to delete me from the internet?

I'm confused -- Mbox files can indeed be viewed offline.

(Not to say your point about the ease of automating Takeout is not valid.)


Maybe OP refers to the situation where you lose all data since the last takeout, but if you had a daily imap backup running you could mitigate this loss for your emails.


Thank you for clarifying.


I have used Google Takeout in the past and it seemed to work fine.

But the crucial question is, can you use it after your account was automatically blocked? That's when I would acutally have an urgent need for it.


Google now lets you schedule up to 6 automated Takeouts a year, so you get an email every 2 months with links to download all your stuff.

If you do this every year (I do it on Jan 1st as part of some other yearly reminders) then at most you lose 2 months of data (certainly not downplaying the gravity of that but it's better than everything).


That is a bit late. You don't make backups after you lost your data.


Possibly not, so it is best to keep a synced backup of your messages and critical date. mbsync for email and rclone for Drive and Photos.


Whenever I've tried to use Takeout in the past, it's missing a large amount of data. Haven't tried it in a couple of years though.


I prefer Thunderbird. It doesn't need the app password for Gmail. With a bit of effort to you can make it auto open and close via scheduled commands to do a daily sync, or just leave it open.


I have used mbsync and seems to be a great tool. In order to view those emails I used neomutt and works pretty ok, is there a way to access those email archives with a GUI-based email client (e.g. Thunderbird) without converting them to MBOX? My dad is not going to use a MUA for reading old backups of mails, I thought about setting up a local IMAP server (dovecot) and try to access that via Thunderbird but could not make it work at all...someone with a similar experience or some insights?


I'm assuming that you're using maildir, in which case running an imap server can be pretty easy to do. I've always used courier from the default debian install and it looks in ~/Maildir for the mails and ~/Maildir/.foldername/ for folders under the inbox and it has worked well for me before. I seem to recall the last time i tried dovecot (maybe a decade ago?) it was more difficult to setup but courier has always been nice and simple for me to manage since it's using the default PAM authentication stuff. Probably not what you'd want for an install with lots and lots of users and more complicated authentication needs but for a backup mail server it's worked great for me and also for my parents


Mbsync with the mailu project works really well for me, is recommend investigating it


You can add mbsync to Cron or something like that, then use claws mail or astroid (if you use notmuch) to search on it.

If you make the tools transparent I think he won't notice, or even notice how fast it is to search everything locally.


I am using mbsync with Evolution, and it has worked well for me. Might be worth noting that I'm only syncing in one direction though.


I know it may not be the culturally appropriate message on HN, but what is wrong with using Gmail and syncing the messages with mbsync? If you're looking for better providers then look at Fastmail.


I was on the look-out for a reliable, and easy to use email backup solution few months back. I have looked at a few of such tools and I realize I'm very likely to forget using it without having to consult a documentation.

My solution, though may not be the best, is to use Thunderbird and let it save a local copy, and not to delete from IMAP. I did a Google Take out of over 7GB of my primary email account, converted to MBOX, and dumb that in the local folder. Now, I have a searchable archive of my emails from 2005 onwards (I have lost everything before that).

That is the general gist, thought the overall setup is a bit winded and involved multiple IDs, etc.


Same here (on just using tweaked Thunderbird). I have used i think it was gvault or gmvault years ago, and it worked fine...But then I started using Thunderbird for actually managing my email (as a desktop email client).,..and then learned that one can set up maildir (so it is not a single mbox file), plus there are some offline settings within thunderbird...and it effectively did the trick of many of these automated methods...but i get the benefit of having a pretty good desktop client too. All i then have to do is ensure that the folder hierarchy of the maildir (as thunderbird saves the files and directories locally) is copied and backed up to a different location (per the ol' 3-2-1 backup strategy)...and i'm good; Thunderbird for the win!


For gmail backup, I use https://github.com/gauteh/lieer which stores the result in a maildir with labels synchronised with notmuch.

It uses Google's API to fetch and send e-mails, and is quite performant.


Other useful tools: https://web.archive.org/web/20170303024153/http://www.athens...

But most interesting, if you don't want to backup to disk but say, migrate to a new IMAP server: https://imapsync.lamiral.info/


Thanks for the tip on imapsync. Never heard of it, might come in handy, and the website looks trustworthy.


I used it a couple of times, it comes really handy when migrating mails between hosts - and it does a wonderful job with that


Seconding the recommendation of imapsync. It works wonderfully for me.


This looks good, I've been meaning to backup my emails to the cloud for a while.

For anyone interested, or has suggestions on how to improve my general backup process (DBs, pfsense router config dumps, bitwarden passwords etc), here is the general process I use:

  - Run a backup container as part of the docker-compose file as part of the service definition 
  - This container will always run, and start a cron service
  - The cron is run from an env var set to the cron schedule syntax
  - The cron runs an entrypoint script that calls a mounted mounted script in a consistent location (/cron/run.sh)
  - The script will do some basic integrity checking, like making sure the file containers some data, is over 1MB and a few other things. This could be greatly improved. In the future for databases, I want to actually restore it to a database in a container and query it for data
  - Zip, compress and place the data on a cloud service like S3
  - On completion, the entrypoint script calls healthchecks.io to inform that a ran was done. I will get an alert if this task does not run in a set amount of time. The healthchecks.io alerts are created with the terraform provider, but I would love a way to integrate this with my current setup
In the future, I should really be doing some kind of integrity checking of these files once on the cloud.

I used Duplicati for a long time, but I had a disk failure on my home server recently and when I went to restore from a Duplicati backup on S3, it was corrupted. I should have had alerts set up for this, but I have now sworn off any solution that I don't fully understand whats happening with.


I suggest using https://github.com/jay0lee/got-your-back for gmail.

It works better because it can copy rules, labels, stars, etc.


Backing up my gmail has been on my list for a while (as step 1 of a migration away from gmail). Does anyone haver any experience with this tool, and could compare it to offline-imap (https://github.com/OfflineIMAP/imapfw) or mbsync? For something important like email, I'd rather use something older with more battle testing. I guess beyond a backup I would hope to use the offline copy to migrate to a new provider.


I've been using mbsync (+ notmuch for indexing) for a good while. I run it nightly from a cron job, and it does what it is supposed to do.


I've used OfflineIMAP for many years, and recently (less than a year) moved to mbsync. It's much faster, and the end result is largely the same (I did some sanity tests, downloading both and comparing).

I'm not exactly sure if you can migrate to a new provider by providing the mailbox yourself, but you can still use things like notmuch to index and search on the mbox.


You can migrate messages from mbsync as it provides push or pull, or two-way. You just wouldn't be able to retain a provider-specific email address. That is why I would recommend everyone to use SimpleLogin or similar for email addresses with a custom domain, and then just have them forwarded to whatever email provider you're using at the time.


Actually, thanks a lot for the recommendation. I'll be migrating my custom domain from Google apps into something else and this will be of great help.


Do you have issues with Google/Gmail's policies etc or are you moving to a different host for other reasons?

I run my business emails on Google Workspace because the price is right, but I've spent a fair amount of time considering a move due to their invasive scanning etc.


I'd avoid OfflineIMAP and instead use mbsync. OfflineIMAP adds headers to the synchronised/copied emails, whereas mbsync makes a verbatim copy which is good for preservation/backup.


I’ve successfully used imapsync (https://imapsync.lamiral.info/)


offlineimap has also been around for a while, and I've been using it for a long time to synchronize my IMAP accounts to my local ~/.mail folder, so I can use email through mutt/neomutt/any other terminal-based local client.


For GMail, why this and not http://gmvault.org/? Been meaning to backup my GMail account and had this one bookmarked for ages, but never got round to using it.

Edit: or https://github.com/djipko/gbackup-rs/ which was a more recent bookmark



I use isync/mbsync (isync is the project name, mbsync is the executable name): https://isync.sourceforge.io/


Note that since IMAP doesn't support second factors, you really want to have a separate, high entropy password for IMAP. This appears to be possible for gmail: https://confluence.atlassian.com/cloudkb/configure-gmail-ima...


How practical is this for a 15GB Gmail account?

And any ideas how to search through it once you have the data on your home computer?


Mine is 30 gigabytes or so. Using mbsync, it takes up to half an hour for the first sync, then 10 seconds or less to sync after that.

I do that once every minute using Cron, and it has been serving me quite well.

I don't usually open up my archive folder, when I do it takes quite a lot of time to open on neomutt at least. Notmuch works really well after indexing though.


Mine is 7GB and the tool has no issues, don't see a reason it wouldn't scale.

On my SSD ripgrep takes about 5 seconds in cold start to search over all of it, then less than a second after that (presumably because drive caches have populated). It may be enough if you only search mail infrequently, otherwise perhaps Recoll [0] indexer might help, I believe it can handle .mbox files (so each message would be indexed as a separate document)

[0] https://www.lesbonscomptes.com/recoll/pages/features.html


I wrote a cross platform Desktop app that can analyze these email backups to provide a visual clustering of the contents of your mails.

https://github.com/terhechte/postsack

It parses my 13gb Gmail backup in under a minute.


notmuch can index such mail archives and then search them in seconds.


If you end up with a very large mbox file you can open it with NodemailerApp https://nodemailer.com/app/ - it’s a desktop GUI that indexes and displays emails from multiple sources


Excellent, I was thinking of backing up my Gmail (but not with Thunderbird, had issues there in the past).

One suggestion - make it clearer that the backup format is not proprietary and can be consumed by other software. I assume that you're using a standard mbox format...


I recently needed something like that, but didn't want to deal with perl or ruby, so I wrote

https://github.com/kardianos/imapdown


Here’s a Python analogue that is single-file/zero dependencies: https://github.com/rcarmo/imapbackup


I’m curious about why you’re backing up your email from cloud providers?

Presumably their storage will be more durable than yours. I saw one person mention using it to help migrate to another system.

What other reasons do you have?


> Presumably their storage will be more durable than yours.

Backup storage doesn't necessarily need to be more durable than the thing it's backing up, as long as it has different failure causes. (also, "my storage" or a replica of it could easily be rented from a different cloud provider, but generally, if a single event impacts my home storage and my email provider at the same time, I likely have other worries)

> What other reasons do you have?

Companies temporarily or permanently suspend users for a long list of reasons, often without warning. Companies are not safe from bugs or outages. Accounts can get hacked even if you're careful. E-Mail is extremely important and at the same time usually a small amount of data, so easy to back up.


I back up my Gmail in case I'm one day shadow banned for a supposed, obscure violation of the terms of service. Usually there's no recourse in those situations. (I use getmail for this.)


If it's a shadow ban, then everything would look normal for you when logged in, wouldn't it? I'd be more concerned about a regular ban where you may lose access to your content altogether.


You're right, I was thinking of a regular ban / account lock out.


Moving email from one account to another.

Freeing up space in the 'online' store by copying everything with attachments over 6 years old to your machine and deleting it.

keeping a copy for offline access

keeping a copy incase the provider blocks your account for some reason.

Modifying stuff in some way the UI won't let you do (for example modifying emails in the inbox)


Can't speak for author but I use mbsync to keep an old Gmail account but self host the emails from my own server where everything is encrypted at rest. It syncing the emails to one place allows me to manage the 6 accounts most of which are work related through the one dovecot/roundcube interface.


Accounts can be canceled. Payments can be missed.


Cloud email providers like Google and Microsoft offer replication but not backups. It's always good practice to have a backup.


Why not just fetchmail to pull everything? That's what I do.


For GSuite (or Google Workspace) I just use Cubebackup [1]

https://www.cubebackup.com/

It is like $5 per User.


On a related topic, for Office365 cloud-mdir-sync is useful because IMAP may not be supported.


I guess pop3 doesn't preserve folders?


POP3 don't support folders*, period.

* Except for pre-defined ones.


OPs profile pic is amongus




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: