Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Segmenting data is exactly what writing it into non-overlapping Parquet files is. My point is that many tools can read a bucket full of these segments, and most of them can handle a scheme where each file corresponds to a single value of a column, but none of them can agree on how to efficiently segment the data where each segment contains a range, unless a new column is invented for the purpose and all queries add complexity to map onto this type of segmentation.

There’s nothing conceptually or algorithmically difficult about what I want to do. All that’s needed is to encode a range of times into the path of a segment. But Hive didn’t do this, and everyone implemented Hive’s naming scheme, and that’s the status quo now.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: