I'm a little naive on this subject, but just wondering what are the use cases fo...

pepemysurprised · on June 21, 2021

Also see my comment above, but you find this kind of storage commonly in game development [0] where you are optimizing for batch access on specific columns to minimize cache misses. It's usually used as the storage layer for Entity Component Systems. It's also called data-oriented design [1]

[0] http://cowboyprogramming.com/2007/01/05/evolve-your-heirachy...

[1] https://en.wikipedia.org/wiki/Data-oriented_design

potamic · on June 22, 2021

thanks!

mananaysiempre · on June 21, 2021

I’m not sure about any performance gains or working with large datasets, but the ancient Metakit[1] was just a really pleasant relational algebra library ( ≠ SQL data model library, it could do e.g. relations as values which are difficult for row-oriented databases). I’d say that Metakit & OOMK in Tcl is strictly better than the relational part of Pandas in Python, except the documentation is somewhere between bad and nonexistent.

[1]: https://git.jeelabs.org/jcw/metakit

nvartolomei · on June 21, 2021

Not subject matter expert but few that come to mind: memory can become a bottleneck, reading sequential data instead of jumping pointers/reading useless data and trashing caches gives much better throughout, compression applied to columnar data is more efficient and can give a throughput boost when memory bw becomes a bottleneck on systems with high number of CPUs.

dafelst · on June 21, 2021

I am merely a dabbler in this area and definitely not an expert, but my understanding is that columnar stores tend to be substantially more efficient for analytical operations over large sets of in memory data by virtue of the data being easier to operate on with vectorized instructions like SIMD.