We presented using Parquet formats for bioinformatics 2012/13-ish at the Bioinfo...

jltsiren · on Oct 5, 2022

Maybe column-oriented formats like Parquet never became popular in bioinformatics because new file formats usually come from people developing tools for upstream tasks such as read mapping, variant calling, and genome assembly. They are the ones who work with new kinds of data first.

Those upstream tasks tend to be row-oriented. You often iterate over all rows, do something with them, and output new rows in another format. Alternatively, you read the entire input into in-memory data structures, do something, and later serialize the data structures. Using column-oriented formats for such tasks does not feel natural.