I'm a little naive on this subject, but just wondering what are the use cases for in-memory columnar stores? I was under the impression that columnar stores are good for OLAP use cases involving massive amounts of data. For datasets that fit within memory, are there still benefits in organizing data in a columnar manner and are the performance gains appreciable?
Also see my comment above, but you find this kind of storage commonly in game development [0] where you are optimizing for batch access on specific columns to minimize cache misses. It's usually used as the storage layer for Entity Component Systems. It's also called data-oriented design [1]
I’m not sure about any performance gains or working with large datasets, but the ancient Metakit[1] was just a really pleasant relational algebra library ( ≠ SQL data model library, it could do e.g. relations as values which are difficult for row-oriented databases). I’d say that Metakit & OOMK in Tcl is strictly better than the relational part of Pandas in Python, except the documentation is somewhere between bad and nonexistent.
Not subject matter expert but few that come to mind: memory can become a bottleneck, reading sequential data instead of jumping pointers/reading useless data and trashing caches gives much better throughout, compression applied to columnar data is more efficient and can give a throughput boost when memory bw becomes a bottleneck on systems with high number of CPUs.
I am merely a dabbler in this area and definitely not an expert, but my understanding is that columnar stores tend to be substantially more efficient for analytical operations over large sets of in memory data by virtue of the data being easier to operate on with vectorized instructions like SIMD.