I wanted to love Thunderbird, used it for years then a bug [0] literally deleted all my emails. I regularly see updates of people understandably raging on the ticket :( It's a bug that literally deletes user data from both the server and the client without warning. It's been open and confirmed for 17 years straight. It could happen to you. How is it not top 1 priority to fix it?
I too think they should drop everything else and work on this bug. But an example of how these kinds of bugs are tricky:
I have a similar bug at my job: Sometimes browsers delete our extension's database, or otherwise corrupt it. It's been an issue for years, but no one has been able to reproduce it. It's probably a 1 in 10 million bug.
I think it's a hardware bug. My "fix" was to backup a small, but key, part of the database to a separate storage mechanism browsers let us access. When the issue arises we can now try to detect the missing data and restore part of the db.
BUT! If this is actually a hardware bug, there is a chance that this additional database write will cause this big to occur even more often, as we now have to write to storage twice as often!
> Sometimes browsers delete our extension's database ... but no one has been able to reproduce it.
Are any of the bug reporters using Vivaldi? As for some reason that browser allows (and enables by default) clearing extension storage when clearing history/cookies via Delete Browsing Data.
This is something the Stylus addon dev noted when a user was reporting the addon wasn't remembering their settings and it was because the user had unwittingly wiped the extension storage due to the browser defaults.
(I actually use Vivaldi myself but virtually never use that feature so was unaware of the behavior until reading about the bug report)
"Delete" was an over simplification. The database seems to be there, but the data isn't. Dozens of engineers over the years have tried to find a way we could have accidentally cleared the data, but not the keys.
No user that has reported it has been willing to share their database with us (chrome stores it in a standard leveldb), as it contains private information.
But good to know that about Vivaldi! I didn't know that before, thanks!
Thanks for linking that. I've tried Thunderbird a couple of times in the past and quite liked it, but that thread has put me off using it forever.
Even if the bug is fiendishly hard to track down and reproduce, you'd think there would be some additional safety checks they could add that would at least let it fail with an error message instead of actual data loss.
People would still complain about them on forums, often ones run by the company who makes the client! I'm often reading threads of issues on Apple's public support forums. Being open or closed source has nothing to do with hearing about problems.
Closed software doesn't have open bug trackers, so there's no systematic way to find out.
An acquaintance of mine were twice hit with a bug that corrupted Word documents stored on iCloud if editing on her iPad. Searching online yielded others with the same problem from more than one year ago...
I was able to find complaints fairly easily. I had them listed but HN ate my comment. Search "Missing emails" instead of "delete all emails" as the latter tends to provide instructions about how to bulk delete.
> Being open or closed source has nothing to do with hearing about problems.
Also, pay attention to observation bias and userbase bias.
If my dad faced this issue, he'd never post online. He'd call me or go to a computer repair shop. That's what your average user will do.
Open Source users tend to be a bit more tech savvy. There's that famous article about Linux gamers reporting way more bugs than average users and how it can be accidentally misinterpreted as "why develop for linux?" These frequency biases are a big part of this. Pluus, OSS tends to do better bug tracking.
> you'd think there would be some additional safety checks they could add that would at least let it fail with an error message instead of actual data loss.
My guess is that these would exist, and do.
I think you've just made an assumption about a bug that was reported 17 years ago. Assuming nothing has been done since. It looks like they can't reproduce it, *making it impossible to mark as fixed* even if it was. But I wouldn't assume nothing was done.
Also remember that Gmail, Outlook, and others are in play here. They also maintain trashed items for 30 days, making it easy to recover. As the provider, they shouldn't make it easy to mass delete things either, right? TB is just the interface, frankly, I'm not sure I know how to permanently delete emails with it. I'm not sure I can. But the interaction here should result in multiple lines of defense.
It's not just one report from 17 years ago, it's 194 comments with the most recent one from nine months ago. It doesn't seem like mitigation steps have been implemented.
Well, I do think the thundebird team should investigate and fix this. But it is almost impossible to fix a bug you can't reproduce and have no clue why it might be happening.
> But it is almost impossible to fix a bug you can't reproduce and have no clue why it might be happening.
No, not at all. It's very easy.
This bug involves taking an inappropriate action under corrupted conditions. You don't need to know how those conditions arose. All you have to do is check whether they currently obtain, and - if so - refrain from taking the inappropriate action.
For this bug, that looks like this:
1. When we're executing a "move"...
2. Before deleting the original messages...
3. Check whether the copies are identical to the originals...
4. And if not, delete the copies instead of the originals.
At this point, the bug can't occur. The "root cause" bug, where your buggy logic says that you copied a bunch of messages even though you didn't, can still occur, but it can no longer delete any messages.
So…do it. Sounds like it’d make a great case study that would get a person tons of attention and praise on HN, a real feather to put in one’s cap.
Literally nothing stopping anyone in this thread from opening a PR with this reportedly “very easy” fix that’s eluded developers for nearly two decades, and is so terrible folks swear off Thunderbird forever because I guess for email very basic rules for backing up data don’t apply (or something?) and/or Gmail and Outlook are implicitly trustworthy?
> and is so terrible folks swear off Thunderbird forever because I guess for email very basic rules for backing up data don’t apply (or something?)
Well, this bug literally causes Thunderbird to delete your original copies of data during the backup process, so I'm not sure why backing up your data is supposed to be the solution.
One of the many comments on the issue notes that although the bug has reoccurred in every version of Windows, it might not get much attention from developers because it is catalogued as something specific to Windows XP.
Nobody in the intervening nine years followed up by updating the bug's metadata, though. It's still "Windows XP only".
I try to never underestimate the incompetence/lack of concern people can have when it comes to addressing major product issues, but if this has been open for 17 years and is so widely known, somebody has surely looked into it and determined it’s not so easy.
and then they simultaneously determined "yeah, we might eat your data. Lets not warn anyone about that AT ALL, lets keep the feature activated and let them users lose their data". This behavior ought to be criminal.
> Lets not warn anyone about that AT ALL, lets keep the feature activated and let them users lose their data
How did you conclude this?
IDK why the assumption is that safety measures haven't been created. You wouldn't mark the bug as resolved if you put in safety features, right? You *ONLY MARK AS RESOLVED* after reproducing the bug and *VERIFYING* that it won't happen again. Right? Dear god I hope this is what you do, because otherwise you are prematurely closing bugs.
I'd agree with you that the fact that the bug is still open after 17 years isn't the problem, but the issue is that people are still (as of 10 months ago) running into the issue of their mail being deleted. If they'd secretly implemented "safety measures" as you suggest that wouldn't be happening.
Looking at the timeline, it's possible that they've addressed a few of the bugs that result in data loss several years ago, and it's possible that the latest guy who ran into the problem within the last year triggered it in new ways or under new conditions but it's clear that the problem of thunderbird deleting messages from the server when copies haven't successfully been saved during a move operation wasn't solved by any "safety measures" 9 months ago and it's doubtful that it's been solved now.
My guess is that because thunderbird ultimately doesn't bother to make sure that messages are successfully and accurately copied before it removes them from the server it'll only be a matter of time before someone else stumbles on some other set of circumstances which results in data loss when messages are being moved.
Reading the bug report it is unclear to me if the newest one is the same bug. There are also other bugs referenced that look to be fixed.
> If they'd secretly implemented "safety measures" as you suggest that wouldn't be happening.
But we can *VERIFY* that measures were taken. In fact, easily! We can look at the references at the very top of the bug report!
- Title: move/copying multiple imap messages to local folder bypasses offline store and redownloads messages. Need to preflight the move/copy.
Status: RESOLVED FIXED
https://bugzilla.mozilla.org/show_bug.cgi?id=505456
There are others that need to be hunted for but this one was trivial to find and was implemented pretty quickly. There are also other similar bugs that weren't marked as dupes. Some of these have been marked as resolved and fixed. That leads me to believe that they just don't know what exactly this bug is because they can't reproduce. It may very well have been resolved and new issues might be completely new bugs. I mean... it has been 17 years... and TB has undergone significant rewriting. Don't you think that the software changed quite a bit in that time?
Which all I'm trying to argue is that they didn't just sit on their asses and do nothing for 17 years
> But we can *VERIFY* that measures were taken. In fact, easily! We can look at the references at the very top of the bug report!
>> Title: move/copying multiple imap messages to local folder bypasses offline store and redownloads messages. Need to preflight the move/copy.
>> Status: RESOLVED FIXED
This might be a more compelling observation if the bug was related to data loss. This just says "if you have a local copy of something, read from that instead of reading from the remote server".
It addresses the specific observation made in the thread that you can encounter the data loss bug even if you already have local copies of the messages, because Thunderbird ignores those, redownloads from the server, fails, and then deletes everything. Now, if you have local copies, Thunderbird won't try to redownload them from the server and the fact that the data loss bug isn't fixed won't matter to you.
You could apply this same approach to the entire bug, by guarding the "delete all of the user's emails" action instead of the "move emails that already exist locally" action. But they don't.
are you being for real? did you see anything as such in the bug listing? but even IF they did put safeguards in place, the fact that this is SEVENTEEN YEARS, no warning, functionality still enabled without ANY WARNING losing people data. unforgivable.
How can you possibly justify this behavior? I understand they dont owe the world any software, fine, but dont knowingly publish stuff that KILLS PEOPLES DATA without atleast a warning
> did you see anything as such in the bug listing?
Yes
> but even IF they did put safeguards in place, the fact that this is SEVENTEEN YEARS, no warning, functionality still enabled without ANY WARNING losing people data. unforgivable.
The software has change a ton in 17 years. Right? We can agree on this? (I mean it underwent a major revision in 2018, getting a lot of the codebase rewritten (like Firefox Quantum).
So let's consider a hypothetical situation. Suppose the problem was resolved in the almost 2 decades of rewriting BUT you still do not know what caused the bug in the first place and, consequently, can't reproduce it.
Do you mark the bug as resolved?
Now let's not sit in the hypothetical setting and act as developers. Some safeguards have been put in place (you can verify by looking at referenced issues). You've solved similar, but are unable to determine if these are the same problems or different problems (again, see referenced or use the search).
Do you mark the bug as resolved?
Your sibling commenter implied they would. Personally, I wouldn't. Marking as resolved is a promise to the user that it is fixed. But I can't make such a promise. I can't make any strong statement until I can reproduce. So yeah, it seems appropriate to me that it is marked as "unresolved" with steps "needs reproduction." That is an entirely appropriate status to me. You try as hard as you can and you implement as many safety features as you can, but you don't mark as resolved until you can verify. Unfortunately, this means issues go stale. Hell, there'll even be some noise like if a hacker or even just your dog deleted everything. We wouldn't want to assume the user is dumb and lull ourselves into a false sense of security, right? But you can only do so much.
*YOU CANNOT CLOSE A BUG REPORT IF YOU CANNOT VERIFY THE BUG*. That's the policy they are using. You may use a different policy, but that's the one they are using.
> YOU CANNOT CLOSE A BUG REPORT IF YOU CANNOT VERIFY THE BUG. That's the policy they are using. You may use a different policy, but that's the one they are using.
and then I would say: YOU DO NOT STOP WARNING PEOPLE UNLESS YOU CAN VERIFY ITS FIXED
(which ofc assume you bothered warning people to begin with)
We must be talking past one another. I'm (and others) are assuming they can't reproduce the bug. Assuming they aren't lying when they say so and assuming they've tried.
I mean let's take the trivial case. Assume user is dumb, deleted the files, made a bug report. Devs will never be able to reproduce unless user tells them they deleted everything 'on purpose'. That ends up with a permanently opened bug report no matter how much time you spend trying to fix the issue and no matter how many safety features you build in, right?
Okay, then yes I misunderstood you. I mostly agree but it's also been 17 years and what are the odds that the offending code still exists? What are the odds that it's TB's fault?
I know people report the issue but googling I can find similar complaints across all major mail clients.
I just don't think there's enough information to make strong conclusions and I don't think California cancer warning labels work. I think they teach people to ignore warnings instead.
Make no mistake - I am not absolving them of leaving this issue unaddressed lol just saying if it was easy they’d likely have handled it. It’s probably difficult or they just don’t know, so they keep putting it off and decided that not enough users are affected for real consequences (which is wrong to do)
nobody should fault the person who have coded the bug, unless someone can prove it was done on purpose. What I am suggesting is that the project as a whole has the responsibility to not just sit on data losing bugs for 17 years without warning users.
the fact that they choose not to, makes me perfectly OK with them being held criminally liable.
I would not want my email client to be relying on such brittle and incorrect heuristics.
A better workaround would be to keep deleted emails around for some time so users have the option to restore them if the bug triggers. But this has drawbacks such as potential privacy breakage (you meant to delete mails you don't want the chance that anybody sees it) or free disk space management (your local drive is overloaded and you want to urgently free up space) or ux confusion (this is a de facto trash but Thunderbird already has such a feature)
Ultimately, what needs to be done is make the code robust, make sure there are no race conditions, etc.
> Well, would you rather have a brittle heuristic lose all of your mail?
That's not what's happening. I wouldn't expect such an heuristic to be currently present. There is a bug, not something intentional.
> almost certainly better than doing nothing
No, because with such an heuristic, you add behavior that's difficult for the user to understand well and to work with. With such an heuristic, you will lose some mails and at some point the process stops in the middle. Which mails have you lost? What is "many" mails? 10? 100? What if my computer is fast and is deleting 100s of mails per seconds, losing all the mails anyway? What if it is slow and never triggers the heuristic?
If the heuristic does trigger, you end up with a mixed situation where you still have lost some stuff, but not all, and it'll be impossible to understand which ones. It doesn't fix the issue (you still lose email), just makes it even more difficult to understand even for the devs when they inevitable need to track down related issues. You really don't want to willingly add mechanisms that feel like they are non-deterministic: they are hard to debug, and hard for the users to grasp.
A way better solution is backups anyway: if you care not to lose your emails, you should be backing them up. From the beginning, your local TB mails are not a proper backup of your IMAP account because it's two-way synchronized so you need a backup somewhere else.
A still better workaround is disabling the move to local folder feature and make people copy and then manually delete mails.
Not saying your heuristic is not a good idea or clever (it is clever and could lead to further good ideas), just that after reflection, it should probably not be implemented. It barely starts to address the issue and adds complexity for everyone involved.
Except the bug was filed in 2008. Back then, Rust was Graydon Hoare's personal project that Mozilla wouldn't start funding until a year later. Rust was written in OCaml and the famed borrow checker wouldn't be in place until 2010. The first public release was v0.1 in 2012 and the first stable release 1.0 wouldn't happen till 2015. The language was very different back then with sigils, garbage collection and green threading as language features. So this bug was already bugging people when Rust was just an embryo that was still years away from birth.
Now even if we neglect the timeline, Rust only guarantees memory safety. If TB is deleting mails on the server too, then the corruption is happening over IMAP connections as well. Does that sound like a memory safety bug to you? Perhaps it is. But how do we eliminate the possibility of a logical bug that Rust won't protect you against, when nobody has any clue even now? And all that aside, if you're going to rewrite it in Rust, you might as well start a new project in Rust instead of porting an old design that may potentially contain a language-agnostic logical Heisenbug.
I'm not trying to be hostile here. I started using Rust in 2013 (I have 12 years of experience in a 10 year old language, and a bunch of repos that I can't compile anymore unless I compile the compiler from old commits somehow!). I wouldn't use C or C++ for any of these applications - I simply don't have enough competence to avoid the kind of bugs that Rust protects me from (despite being a hardware engineer with more knowledge about memory management than about type system theory). Despite all that, statements like this will only cause an unwanted backlash against Rust. Not that you're entirely wrong, but some people are so offended by such suggestions for reasons that are still under investigation, that they start a crusade against Rust [1].
You counter the facetiousness in a way it stops spreading and possibly even spark a constructive discussion is how I understand it (ESL though). I certainly observed this phenomenon myself (although as the person being facetious, I often feel like "I was joking, I actually agree, that's indeed what I was actually implying, but good you made it clear and explicit I guess")
I guess you'd disarm the person being facecious rather than the facetiousness, like you'd disarm someone about to cast you a magic spell.
At this point, answering to a "rewrite it in Rust" comment which doesn't go into details is a cultural faux pas, you just smile or roll your eyes and move on :-)
Looking at the various issues reported in this thread, it honestly seems that burning the entire codebase and rewriting it would be the best choice. Bonus points for using a modern systems programming language.
A better approach might be to feed all of this into an LLM to have it figure it out. If it finds a bug and has a fix, reproducing it might be easier and a test could potentially be written.
I don’t think LLMs are the answer to everything, but this would be a good test for newer generations of LLMs as they’re developed.
Worst case- it deletes all of your emails, but that would’ve happen anyway, right? =)
Reproducing bugs is a luxury and not even close to required for analyzing and fixing issues. Even if the issue is external (hardware, antivirus, etc.), the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.
The problem is you can never close the bug report if you can't reproduce. I guess, you could, as the other commenter suggests, mathematically prove that it can't happen, but otherwise you're prematurely closing it.
How do you differentiate that you solved the bug and not a similar looking bug?
> the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.
But this doesn't solve the problem.
- What if it is an upstream issue? They have to be connected, since they are deleting data. Maybe it is completely a bug on their end? Doesn't matter how defensive you are if the bug was "anytime an email has 'man man' and is pulled between 00:00-00:04 everything deletes" then what can you do?
- What if the user was hacked and the hacker just deleted all the data?
- What if the user was just dumb and deleted the data themselves. Either not knowingly or were embarrassed to say anything.
- What if it is another program on the user's computer that is deleting the data because of some weird unexpected collision?
I'm sure you can think of more situations that still won't solve the problem.
How do you close the report if you can not make strong guarantees that it is resolved?
A luxury? Not even close to required? You are not afraid of words! I'm not looking forward to receive a bug report from you!
Yeah, reproducing is not theoretically mathematically necessary. In theory you could prove your code is correct with formal methods¹. Now, nobody does this because it is impractical (borderline impossible), reproducing is in practice so useful as to be almost essential:
- it lets you study how your code behave in the problematic case and identify what's causing the exact issue the user is seeing
- it lets you check that your fix does indeed address the bug
I have indeed already fixed trivial bugs without reproduction cases from a vague description of a bug because I'm intimately familiar with the code and it immediately rings a bell: the cause is immediately obvious. But that's not the usual case.
> the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.
What if the code is already designed like this (and I sure do hope it is currently written like that, because that's almost common sense, if not the only sensible way of moving something) but somehow fails for some currently inexplicable reason? It smells race condition to me.
In the case of the discussed bug, users have described a reproduction case that's not 100%. But someone will need to find a 100% reproduction case. Users, or devs. It will not be optional. You can't play a guessing game, attempt to fix the code and hope for the best. You might be able to actually fix the bug, but without much confidence. Best case, you'll be able to find a reproduction case after fixing the bug (that you'll probably use as a functional test), to prove you fixed the bug for this specific case you found. You'll not be 100% sure you addressed the user's case.
A bug can hide another one, so you could find and fix a bug, but the issue is still present in the user's case. You can only be sure with their reproduction case.
But I agree that it is hard to reproduce a race condition.
¹ which in practice applies to code of trivial size (static analysis), or consists in checking a model but not the actual implementation (model checking), or does apply useful checks but is not exhaustive and has false positives / negatives (static analysis), or does apply useful exhaustive checks but only on a limited number of executions (runtime verification, and we do have functional tests that serve a similar purpose in practice - and you'll actually need the reproduction case here so you have the right execution to check), or requires you to write your code in a specific language (stuff like coq) and you cross your fingers that this specific language's implementation is itself correct. In short: not applicable here.
> it is almost impossible to fix a bug you can't reproduce
It's also impossible to mark a bug report as resolved if you can't reproduce it.
You could have fixed the bug (especially since a lot of TB was rewritten) but if you can't reproduce the bug you wouldn't know it was solved only that people stopped reporting it. This is actually a common occurrence with long standing bugs.
I've updated my comment for clarity. The bug (which I've never encountered in more than 20 years as a Thunderbird user) is that users move messages to a local email folder, but the messages are deleted from the server without actually downloading them. At a minimum they should disable that operation. The guy that originally reported it worked at Sun and lost hundreds of work messages as a result of this bug. AFAICT the user wouldn't be affected if they did a copy of the messages and then manually deleted them from the server folder after confirming the copy was successful.
A very small number of users have this bug (and tbf, it's a really bad bug), and are unable to consistently reproduce it and it seems none of the developers have been able to (the seemingly random nature of the bug occurring is not helping). How is it supposed to be fixed?
You add more and more diagnostics (e.g. logging) in that area till you manage to track down the bug. Over several years this should be possible.
At that point you can either fix the bug directly or do it properly by first reproducing the bug (in a test) and then fixing it.
Said another way - If they can't reproduce it, they can't close it.
They may well have fixed it already, but without a way to reproduce it the only prudent behavior is to leave it open and wait for the next diagnostic file to be uploaded.
That's not the only prudent behaviour, as the OP said, the prudent behaviour is to add more diagnostics and guards against the conditions that lead up to the bug.
Okay, let's assume more diagnostics and guards were added.
Now re-answer the above questions with these assumptions.
- How do you fix a bug you can't reproduce?
- How do you *close* a bug report when you can't reproduce?
Being generous here, we're assuming there's 17 years worth of diagnostics and safety guards added but through that time the bug still isn't reproducible. Let's try to answer the questions under these assumptions.
If you've added guards and diagnostics, then you close it until someone else files a follow-up, then it can be re-opened. There's no sense keeping it open unless there are ongoing reports of the issue.
> There's no sense keeping it open unless there are ongoing reports of the issue.
I think you've misunderstood. There's other options.
Let's consider this from a failure analysis standpoint. Here's our options
- You have incorrectly marked issue as solved
- You have incorrectly left the issue marked as unsolved
*Which error case would you rather have?*
The classic example of this design choice is with a safe. Let's imagine you are building a safe. If the safe fails, would you prefer that it fails into a state that is unlocked or into a state that is locked? The answer isn't so obvious, as it actually depends on how it fails, right?
A very common example is when designing skyscrapers. The choice is that when a skyscraper fails, there is a strong preference that it falls in on itself (think 9/11). Why? Because if it falls to the side then it takes out other buildings and can create a chain reaction (a related famous example being housing in Industrial Revolution London and fire...)
Your action is a valid option, but it is not the option that I would chose. I think what they did was perfectly fine. They left it open (to avoid tricking anyone to thinking it is solved when the status of solved is actually unknown) and marked with additional information about lack of verification/reproducibility. Essentially, it is marked as stale.
So we're back to the earlier question:
- How do you *close* a bug report when you can't reproduce?
Or we can frame differently: "How do you close a bug report if you have no indication that the bug was resolved nor exists?"
I don't think one more user report is going to be the difference that pushes them over the finish line after two decades. Let's not pretend the developers have been taking this bug seriously.
They still have it marked as unreproducible. What do you expect them to do if they can't reproduce?
So yeah, I do think more user reports can help. At worse, it will make them take it more seriously if there are more reports.
You also are falling to observation bias. You can see linked in the issue as well as by searching that there are similar issues that were resolved and marked solved. So I don't think they were just doing nothing as everyone seems to be assuming.
The way I've dealt with that in the past is putting into into Review or whatever the equivalent is, make a note ("cannot repro, but attempted potential fix in version XXXX, moving to review, please reopen if anyone reports this again) and then if nobody reports it still happening for x amount of time (e.g. 12 months), close it. Can always reopen it if it gets reported again beyond that.
I'm rather surprised by the comments responding to this. A bit by the comment itself.
Why I'm surprised is... well... this is HN. We know that a bug like this is very rare, right? I mean otherwise who would ever use TB, right? But if it's rare, it's really hard to track down. There years of comments without people including system information. The reproduction steps themselves are "sometimes." It's HN, so we can assume users here program, right? How would you solve this use?
FWIW, I've used TB on Linux and OSX for years and never faced an issue like this. The only one I've faced is sometimes not being able to connect to the server and having to resend an email.
On the other hand, when using Apple Mail:
- messages routinely doesn't show me messages I can see in TB.
- Frequently sending messages from my phone doesn't go through or ends up double sending.
- Searching will pull up emails from a year ago, prioritizing them over the email I got this week and was actually looking for (e.g. searching foo@bar.com).
- I can't even tag emails!?!?
- Do filters even work? Holy cow how do people live without filters!? How do you deal with spam? How do you deal with all those noisy needless messages and newsletter type of stuff that won't let you unsubscribe or comes from domains or addresses you can't block because emails you need come from the same addresses?
- It straight up renders PDFs inline with no warning, helping spammers.
- There's no folders and everything is just all jumbled together in a mess. How does anyone find anything?
Idk, this is an annoying problem but I'd be surprised if I lost all my emails. I can recover deleted emails in Gmail and Outlook. Annoying, but recovering these (go to trash, click "restore") is far less time than what I'm saving on a weekly basis with TB.
I know these problems aren't on all platforms but TB IME has saved me a ton of time compared to using Gmail, Outlook, or Apple Mail. Hell, fucking Neomutt is a better experience than those three, which is insane. Trying to use them is like trying to use the internet without an ad blocker. How are we so bad at email?
I had email service expire on a domain and moved it to fastmail. Fastmail, obviously, did not have copies of the email I received before the move. But my phone did.
When I updated the configuration in K-9, it contacted the fastmail server, found that the mail it had locally wasn't also present there, and immediately deleted all my local copies.
That's not a "bug" in the sense of unintended behavior of the software, but it certainly seems like the software designer's goal is to hurt the user. I obviously didn't want that to happen. There is no scenario where I would want that to happen.
Email clients make some strange assumptions about what kind of actions make sense under what conditions.
I'd wager that this is marked as "by design" because it technically follows some IMAP spec. What the app does, purely on a technical level, is correct.
I've so often had debates and threads in issues where a developer entrenched in a domain has so much domain-knowledge (ie. tunnel vision) that "technically correct" or "by the spec" is the only correct way. I've been that developer in many cases too, in hindsight.
I got also bitten by this. While IMAP would allow for syncing, most MUAs don't have a local mailbox. Instead they have a cache and the offline mode means to put as much as possible in to the cache. But if you want an actual local mail storage, you will eventually have merge conflicts, because how do you deal with a message that is present locally and was on the server, but is no longer? Do you assume the user wants it deleted, or that it should be reuploaded to the server? Either way will be wrong. (Also if you reupload, it will get a new UID, so another MUA will again reupload it, i.e. you will get a copy per MUA.)
So if you want a mailbox sync tool, use a proper program for that like mbsync/isync. (But this can't cope with cross mailbox moves besides (not) propagating them.)
Yes, a spec is a technicality that, ideally, should be abstracted away completely (and alas, all abstractions are leaky somewhat).
It's another thing that got me interested in DDD. Where the user, domain, business and such define what things are called, how they operate, what events they undergo or emit etc. And not a spec, language or framework
And where e.g. "the IMAP spec" is a clear bounded domain, probably a service, an adapter or even library. Within that domain, "The Spec" dictates all the naming, conventions, logic etc, but where the border of this domain is another language. E.g. in a "MessageAdapterImap" something is called "EXPUNGE" with all the intricacies of what this means in IMAP. But in the outside it's e.g. part of a "cleanup()" interface, or whatever the domain calls it when it removes messages.
You make a good point but this is not a "strange assumption" by any stretch. You see Thunderbird has a "move" action. It allows you to move emails from a folder to another. Now somehow, this is apparently NOT implemented as:
1. Copy from source to destination
2. Verify copy has completed without issue
3. Delete from source
but... some other way? So when you try to move from a server folder to a local client folder by an innocent looking drag an drop, combine this with a poor network connection, a garbage tier legacy protocol like IMAP and decade old C++ spaghetti code, and you get this textbook 17 year old severity 1 bug that will never get fixed.
Of course, it isn't; they're relying on an unstable proxy for "copy has completed without issue" in preference to just checking whether that happened or not. Several rather angry users have pointed this out. It doesn't seem to have sunk in.
It's like a car whose engine randomly shuts down with a very low reproduction rate. Except with cars when this happened GM has recalled 30 million cars and paid billions of fines.
Emails to email client is an engine to the car. It is pointless without one and it is THE purpose of it. All the rest of functionality like fancy UI, filter, notifications, editor is meaningless if your emails were deleted without recovery. Even car without engine is more useful than email client with empty DB.
The big difference is a car with an engine that randomly shuts down is a life-and-limb safety issue. An email client that corrupts the database is extremely unlikely to cause a loss of life, even if the consequences are costly.
That said, even if the bug is impossible to isolate, it sounds like the chain of events that leads to it is known. They probably should disable the feature until someone is motivated to fix or replace the code. I'm sure that would anger a lot of people, but someone angry about the loss of a feature is probably better than someone who is angry at the loss of data. Especially given that the feature seems to be something someone would use to archive their mail.
That may be but i immediately uninstalled Thunderbird from all my devices upon seeing that its low priority and unassigned.
I wont be using any email client that can break and delete all my emails from local and the server. Why would i? It may be a lottery but it isnt one i want to play.
The fact that they see this as low priority shows theyre morons.
Who would say 'yes please' to an email client that might permanently destroy some of your most important data at random?
When i say they're morons, i mean it in terms of them not understanding the reputational and trust damage this can cause, via the thread, the low priority, lack of assignment, or word of mouth.
Far too focused on the engineering POV than the optics and trust/reputation damage. My kind of moronic, but still moronic.
It happens to me regularly. You can fix it by redownloading the message from the server using the "repair folder" feature, and I have backup, but it IS infuriating.
I have no good alternative to thunderbird, it does so much of what I want. But this bug is awful.
Note that this is why you use copy, check, then delete, instead of just "move" data, whenever it's important that the process works correctly
Even if the software doesn't have known bugs, I do it if the data is important enough and especially if I were to not have a backup (for example, because the storage provider takes care of backups and redundancy. I personally like to have another copy that I manage myself, but how many people have their IMAP emails or Spotify playlist data backed up for example? I do, but not many people I think)
I have been using TB on all operating systems with 8 or 9 users since 2006 and I never even once encountered this issue.
As a software developer fixing stuff like this is only possible if you can reproduce it or otherwise get logs, telemetry and similar things, otherwise it is pretty much just guesswork.
Granted given the severity of the consequences I would've chosen a more defensive move-strategy (e.g. one that deletes mails only once they have been copied verifiably), but that would have significant performance impacts in the 99.99% of cases where it works, so finding the real problem is preferable.
The truth is that if this happens to you regularly, that you are probably the prime person to gather more data on this. Call it giving back to Open Source software.
> As a software developer fixing stuff like this is only possible if you can reproduce it or otherwise get logs, telemetry and similar things, otherwise it is pretty much just guesswork.
As a software developer you should be able to reason about your code and work backwards from the observed result to investigate possible causes.
When a plane crashes or bridge collapses the engineers tasked with finding the cause don't just throw up their hands if they can't make it happen again.
Technical users reporting bugs like this with a mail client should endeavour to find out if they are serious about the issue. The initial reporter worked for Sun Microsystems.
Oh, wow. When I have to use Thunderbird, I never move emails. I manually copy emails to a folder and delete the emails from the old folder after that. I forgot why I have to do this. Now I know again. A lot of people here are speculating that this bug must be very rare. I maintained like 30 Thunderbird installations for other people. This bug bit me at least twice. It can't be that rare.
> On a several occasions (most recently today), Thunderbird has "lost" my mail messages that are in my inbox when I move them to a local folder. Effectively it corrupts the messages - they appear in the local folder as 1 KB messages with no subject or sender info. They are empty messages.
Apple's Mail app has had a virtually identical bug since Catalina; Michael Tsai's article on the issue currently has 636 comments:
After witnessing the bug myself, migrated to Thunderbird with Maildir enabled[1] for long-term storage; have yet to experience the issue despite a large database (>300,000 emails) and daily IMAP import to local folders.
I've been amazed over the years this has never been fixed -- it's very hard not to make jokes about the standard lifestyle of open-source programmers, that they don't consider this a priority (note: that's a joke, I consider myself an open-source programmer. I would hope that's obvious, but someone just bothered to sent me a mean anonymous message)
Indeed most of the time I already know the exact word I'm looking for; and most of the time I get non-exact hits making it so much more difficult to find the actual message.
Perhaps there could be an option to disable stemming completely from the inverted index, which would be probably easier to implement than a post-search filter (which in itself doesn't sound very complicated..).
But of course, it's open source, anyone could contribute :D.
The search in thunderbird is terrible compared to what we are used to today. Same for k9 (now thunderbird for android?).
Not only is the UX a horror, its results rely on all sorts of technicalities. I, as a software engineer, can understand that a mail has to be downloaded in entirety locally in order to have it indexed and then show up in the results. And I understand that sub-sub directories aren't really a thing in IMAP, so searching this-dir-and-everything below is hard/impossible and so on.
But mostly the search algorithms are poor. So much so, that I often rely on (rip)grep to find mails. Or in a few occasions wrote a quick bash/python horror to push all my mail into a meilisearch instance and then use this search engine to get the lists and filters that I would expect thunderbird to have.
Yes. It's open source. So "go fix it" is a proper reply. But that doesn't make a complaint about the state of the search feature invalid.
I don't know why you're getting downvoted for this. That seems like a pretty frustrating bug when generalised to other word stems. It's also pretty standard to prioritise exact matches when ordering search results so, again, frustrating.
One of my biggest bugbears with Microsoft Outlook has always been that its search function is terrible. If you can't find an email then it may as well not exist, and that's been a real problem on a regular basis during my career - particularly latterly when I was in leadership and necessarily lived in my email and calendar.
It's disappointing that Thunderbird has similar issues with such a fundamental function.
That is annoying. I wish I could advance search or better, use regex. Luckily there are plugins.
Worse though, in Apple Mail I'll search an email address because I got an email earlier in the week and the first thing it'll show me is an email I got from that person 3 years ago with the correct result a few down. I really need a better email client for my phone...
The filter/search feature in Thunderbird does not appear to have a way to search for exact matches. You want exact matches. First, identify if it can do exact matches and if so, expose to the user. Else, who wants to touch that code?
Nitpick: not all bugs have to be reproducible to be taken seriously. Defensive programming, and adding extra logging could be a mitigation to avoid future problems, or to help fixing them in the future.
Imagine you're writing trading software, you have an algo go haywire and it machine guns the whole order book, and then you refuse to put a "max order size" outside of the algo to stop it from happening again because you can't figure out why it happened in the first place.
Try telling a regulator or your boss that was your reasoning.
How many one-off band aids do you think should be applied for rare, never reproduced problems before you slap a “100% safe” label on it and ship it with the confidence of a bloated, cruft-ridden job well done?
Are you arguing in bad faith, or do you just not have any practical experience dealing with complex systems?
Even if the bug can't be reproduced, on the basis of multiple user reports the first step absolutely should be to add some assertions and logging around email deletion.
The point is not to give it a "100% safe" label, the point is to start narrowing down possible root causes. If the problem recurs again, you'll have assertions ruling out certain possible culprit code paths as well as logs displaying the values of relevant variables.
Kinda off topic, but I've been searching for good introduction and best practises for defensive programming, but never really found much. Any recommendations?
I don’t know of any real posts on it, it just ends up being kind of a “assume it’ll go wrong,” then figure out how you know something has gone wrong and track it down. Your starting point is, after an issue is reported, add a load of logs around places that seem like candidates for the flow. Over time, you get a sense of where things can break and you add that telemetry ahead of time.
I feel like this is sort of like reading a book to get better at self defense. Yeah, you'll probably pick up a few interesting things that may be of questionable use. But when you train in martial arts, you often get to go through the motions and put the moves into practice. Even then "real" fights will feel quite different and a lot of the stuff you've learned will likely fly out the window. If you've been in real fights a lot, you've begun to internalize your training and your moves become more like instinct. It's quite difficult to go from book knowledge to instinct without getting beat up a lot in between I think. The real valuable lessons come from building something that breaks and getting to fix it yourself.
This issue comes up in my role a lot, where I am often dealing with various environmental conditions and human factors, plus multiple integration points between various software and hardware systems.
The answer is that you keep working at it iteratively using a combination of logging, reporting, and defensive programming to systematically narrow down the possible causes. Sometimes you never arrive at a true root cause, but you get close enough that you can mitigate the problem and finally close the ticket out. At the end of the day, the customer/user doesn't care as long as it works.
However, what will really piss them off is telling them your hands are tied until they can reliably reproduce the issue for you. It's important they understand that you are working on it, and typically they will go out of their way to help solve the problem when they feel taken care of.
Why do you think it needs reproducible steps? It is obvious that the bug is still active, so in a way it is reproducible, just not in a systematic way.
This happens more often, for example when many services work together in an asynchronous way, and in some very rare situation, unwanted behavior occurs. To fix that, it is often easier to reason through the entire process, and to identify weak spots. It might even be a good idea to switch to a different paradigm to avoid certain bugs altogether.
For this particular bug, I would start by reading a lot, and ensurong that the bug is indeed not easily reproducible (by trying to make it reproducible of course). If that fails, I would continue to think about root causes for the bug, and possible workarounds that would work in theory. Then I would try to estimate the amount of work required, and the risk of breaking other things, and report that to those who like to decide on further actions.
And of course, as I know very little about the inner workings of ThunderBird, I would simply ask ChatGPT o3 or similar for advice. It comes up with a plan that seems reasonable.
No it doesn't. If I waited every time for obscure production issue to be reproduced in lower envs I would be fired... many times for clear and obvious incompetence.
Sometimes, you can add some additional steps, logging, change behavior in corner case a bit, either to get more understanding next time it happens or even mitigate it. Sometimes, you have plenty of tools and ways to act. In my experience, that sometimes is basically always if one cares enough.
What developers should do on such a critical and long standing issue is to offer an extension that victims can install to volunteer to track the bug. So they can click a button when things are fine to take a snapshot, and click another one when they encounter it.
If it was me running the project there's enough information in that thread to piece together an exploratory testing plan around the issue that might allow us to isolate it, and I'd set aside some time for the team to do that.
Whilst obviously not lethal, this Thunderbird bug sort of reminds me of the Therac-25 incidents in the 1980s. Very occasionally the machine would give patients massive overdoses of radiation. This bug wasn't easy to reproduce (thankfully) and turned out to be due to a race condition.
But of course, you can't find a problem if you don't investigate, and if it's a serious problem that's been documented then, as engineers, we can't just hide behind non-reproducibility as if it's some sort of magic shield. We have a responsibility to investigate and isolate the problem ourselves. If we don't do that we are effectively washing our hands of our own creations.
Not only that. Often mitigations can be placed even if the actual bug cannot be reproduced. Like many others in the thread suggested.
I've encountered several impossible to reproduced bugs in the past. And what I (or my team) would then do, is re-architecture (refactor) some pieces of software so that we could reproduce it. Like e.g. better logging, specialized layers/adapters/services, simpler logic, and -above all- better testability.
I explicitly do not, which is why my first response spells out that it’s hard to reproduce a bug when all your data is gone. If it is the case that the bug can be reproduced without user data (as suggested by the person I responded to, not me), then the developers should be able to do that better than users can.
You’re assuming users affected by this bug have control over the VM running their mail server. I won’t argue that it can’t be done, but it’s probably harder than we think.
I've been using TB for personal and work since it's first day and never hit the bug. I've never known any co-workers to hit the bug.
I'm sure plenty of other software that we use every day has similar severe bugs that occur just as infrequently.
I'll keep using TB. I'll also make sure to look both ways when crossing streets and won't assume that cars are going to stop for red lights because my chance of getting hit crossing a street is more likely than getting hit by this bug.
Yikes! I recently tried Thunderbird again because I was annoyed with gmail pushing AI subscription crap in the web UI. Guess I'll take that over this bug for now.
I was a thunderbird user many years ago. Seeing the article made me want to reinstall, and then this comment instantly changed my mind. Insane bug for an email client to have…
A useful use of AI might be to simulate thousands/millions of user sessions (via generating mouse and keyboard inputs), with instrumentation/logging switched on. Run them in various shapes of VMs until you hit the problem. Fuzz testing, basically.
[0] https://bugzilla.mozilla.org/show_bug.cgi?id=462156