Parquet is still relatively poorly supported in the JVM world unless this changed in the last year? Yes, you can use Spark but that's an absolutely huge dependency just to read a file representing a table of data. The alternative - trying to use poorly documented Hadoop libraries - was only marginally better. Maybe the story has changed in the last year?
The other problem with Parquet is that it's overly flexible/supports application-specific metadata. It's all fine when you use a single tool/library for reading and writing files but cross-platform is problematic. Saving a Pandas dataframe to parquet, for example, will include a bunch of Pandas-specific metadata which is ignored or skipped by other libraries.
In this case, it required converting timestamp/datetime columns to a nano time int64 representation before writing data from Pandas (for example), otherwise you could not read those columns using anything that wasn't Pandas.
But maybe this has changed but at the time I last used parquet as a table format?
The other problem with Parquet is that it's overly flexible/supports application-specific metadata. It's all fine when you use a single tool/library for reading and writing files but cross-platform is problematic. Saving a Pandas dataframe to parquet, for example, will include a bunch of Pandas-specific metadata which is ignored or skipped by other libraries.
In this case, it required converting timestamp/datetime columns to a nano time int64 representation before writing data from Pandas (for example), otherwise you could not read those columns using anything that wasn't Pandas.
But maybe this has changed but at the time I last used parquet as a table format?