So have a checkbox for "also persist mailbox in an interoperable format", that then lets you choose mbox or maildir; where it's going to save the data for its own use in SQLite either way; only save in that "interoperable format" asynchronously in the background, and on quit (just like e.g. a Redis RDB file); and, if enabled, also scan the interoperable backing store for changes made on startup, to apply them to the internal, canonical store.
FYI, iTunes.app (or whatever it's called now) for a long time had a legacy XML-file representation of the music library "for interoperability", that you could enable to be persisted to disk alongside its newer, binary DB file; when enabled, it worked exactly like this.
Maildir is fairly performant and Thunderbird does have its own index dbs for performance in mbox and maildir formats already. It's why the compact option exists for mbox since delete only removes the index key until you trigger a compact.
Making the sqlite db the primary would mean that unless there was constant synchronisation I would be missing emails in the other clients.
I feel like just switching to maildir across the board is a pretty good solution performance wise. Although, I do understand that folders with large numbers of files is a problem under Windows (many projects had to rework their design for this). So perhaps a sqlite solution for Windows would be a good idea.. or just a maildir with more nested folders to reduce size, linked to the thunderbird indexing.
> Making the sqlite db the primary would mean that unless there was constant synchronisation I would be missing emails in the other clients.
This is a perfect example of complicating what should be a simple thing to support a very, very niche use case. Thunderbird should just use sqlite so all normal operations including search are fast across all platforms, and if you have a use case like wanting to synchronize with other mail clients using maildir, then write a plugin that will duplicate the sqlite db to a user-specified maildir.
Modern NTFS doesn't have as much of a problem with folders full of files as its reputation states. Though File Explorer always still seems to make it seem slower/worse than it is. (Most of that is still things like populating thumbnail caches and stuff, though, more than actual disk performance.)
The bigger issue with the Maildir standard on Windows is that the Maildir standard uses colons in filenames which is not allowed on Windows.
(ETA: The obvious idea here to me would be to do something like a bare git repo as a Maildir-like with content-addressed storage.)
AFAIU the problem isn't so much NTFS but the (pluggable) Win32 filesystem layers above. It gets even worse with AV software plugging itself into those layers.
Packing individual resources into archives is the norm for Games for a reason.
Hm.. Not sure how modern that NTFS would have to be. Firefox and Minecraft had to do modifications to avoid the issue of slow file access and slow reads of folders full of small files. Hedgewars too.
I feel these weren't the only cases. And none of those had anything to do with Explorer.
But, it might have improved, and might be "good enough" for email.
So much depends on the specific APIs, of course, and how many versions of Windows you are expecting to support, and what your seek patterns and locking expectations/behavior are. (In my experience, it has been misunderstandings of the Windows file lock model/ACL lookups that seem more often the problem than directory size, but obviously everyone's benchmarks are different. File locks are super "slow", especially if you are not opting out of locks you don't need.)
I'm not suggesting that architecting with lots of small files in a single folder is yet the best architecture on Windows, just that for Maildir specifically on Windows it is among the least of the problems.
'k. take your word for it. Esp in relation to Maildir.
I know very little about Windows development.
But... just, FWIW, this particular subject has come up a lot on HN over the years with various explanations.
https://news.ycombinator.com/item?id=18783525
and many many others easily searchable on hn.algolia.com
My fav comment was by an MS dev: "NTFS code is a purple opium-fueled Victorian horror novel that uses global recursive locks and SEH for flow control."
I definitely understand it gets talked about a lot, endlessly. It's not an unearned reputation. I just think that, especially in light of things like that last comment you like, so much of that reputation at this point is folklore more than benchmarks. People take "Windows is bad at lots of files in a single folder" as faith from some bible of Operating Systems Allegories rather than something they've worked with directly or seen tested themselves first-hand.
Part of what certainly doesn't help is that most of the "lots of files in a single folder" applications make other POSIX-based assumptions (such as locks and consistency with respect to concurrency are generally much more opt-in and eventually consistent by default in POSIX rather than opt-out and aggressively consistent by default in Windows). If you are trying to use POSIX-based assumptions on Windows it doesn't matter what you are doing, including "lots of files in a single folder", you are going to have a bad time. I can easily presume that is what happened in most of your anecdotal counter-examples (Java Minecraft, Firefox, Hedgewars, will all have different, plausible POSIX biases), though I can't know for certain without benchmarks and performance data in front of me, and none of those are currently my job. "Lots of files in a single folder" at that point, under that presumption, is a symptom, rather than the root cause. It's very easy to blame the symptom sometimes, especially when that sort of performance debugging/fixing is getting in the way of your real goals and that symptom is sometimes such an easy fix (use more folders, bundle more zips, what have you).
Again, I can't say that with too much certainty without specific performance data, it's just I do think people need to question the "Orthodoxy" of "well, Windows is just bad at that" more than they do sometimes.
Hm. Did you read the comment by the Microsoft dev in the linked thread? He gives the following reasons:
"We've long since gotten all the low-hanging fruit and are left with what is essentially "death by a thousand cuts," with no single component responsible for our (lack of) performance, but with lots of different things contributing"
* Linux has a top-level directory entry cache that means that certain queries (most notably stat calls) can be serviced without calling into the file system at all once an item is in the cache. Windows has no such cache, and leaves much more up to the file systems... [snip]
* Windows's IO stack is extensible, allowing filter drivers to attach to volumes and intercept IO requests before the file system sees them. ... [snip] .. Even a clean install of Windows will have a number of filters present, particularly on the system volume (so if you have a D: drive or partition, I recommend using that instead, since it likely has fewer filters attached). Filters are involved in many IO operations, most notably creating/opening files.
* The NT file system API is designed around handles, not paths. Almost any operation requires opening the file first, which can be expensive. ... [snip]
"Whether we like it or not (and we don't), file operations in Windows are more expensive than in Linux, even more so for those operations that only touch file metadata (such as stat)."
I can say my personal experience under Windows has been that compiling the same project was twice as fast in a linux virtualbox inside windows, than in the host. :)
FYI, iTunes.app (or whatever it's called now) for a long time had a legacy XML-file representation of the music library "for interoperability", that you could enable to be persisted to disk alongside its newer, binary DB file; when enabled, it worked exactly like this.