Ah, a fellow packrat! I have every command I ever typed into a shell since around 2005, and my history weighs in at 1 CD or 650MB (as of a couple of years ago)
I'm probably being wasteful of space because I store each session in a separate file. I used to do a lot of data analysis at the shell back in the day, and found it useful to audit sequences of commands afterwards for mistakes, or to turn them into scripts.
As a lot of people mentioned. This is FTS index. So it is definitely way more blown up. Plus I do save a lot of additional information with it: pwd, session id, shell used, exit codes, whole command obviously. And to support icloud, also additional information for icloud entity id. And now when you point out, 5k per entry is a lot of data. But I am on with that. This information really important for me.
Maybe I have some sort of disease, but while reading "find words out of order or support features like stemming" the regexs for that immediately flashed before my eyes, so I think "necessary" is a little strong there.
I don't think I said it was. I was addressing the specific use cases mentioned. If there's another use case you think is important in searching command line history, feel free to describe it.
Most stemming use cases are trivially solved with a regex. That's the point he was making. The difference between a beginner and expert with regexes is quite a lot.
Maybe! Full-text search is great for text. Command lines have some things in common with text, but they definitely aren't normal text. E.g., punctuation is much more significant. Stemming may not be appropriate. Case matters. Word boundaries are different, and many of the significant lumps aren't really words.
Sometimes it's nice to not manually write a regexp to find all of the variants of every word or deal with arbitrary ordering of substrings. And if you're using SQLite and fts5 is installed, why not just create a virtual full text search table with one command and use that? With a small enough corpus, it's a meaningless distinction to bikeshed about the implementation: the easiest solution to build is the best. 500MB of disk space for a pet project that gives you convenience is a terrifically small amount of storage. I have videos that I recorded on my phone that take up more than double that.
For comparison, my (non-work) history since 2012 (plain text) is 181k entries, and takes 25MB. I store the command along with when and where you ran it. (https://www.jefftk.com/p/logging-shell-history-in-zsh)