Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the introduction: "On the other end of the spectrum, formats such as HDF5 and BLZ address problems with large data sets and distributed computing, but don’t really address the metadata needs of an interchange format. ASDF aims to exist in the same middle ground that made FITS so successful, by being a hybrid text and binary format: containing human editable metadata for interchange, and raw binary data that is fast to load and use. Unlike FITS, the metadata is highly structured and is designed up-front for extensibility." [0]

Frankly, I feel like making the metadata of a scientific data structure human-editable is something of a mis-feature, or at best a non-feature. I use metadata in HDF5 files as a form of provenance tracking and I'd rather there be some friction to editing it.

[0] https://asdf-standard.readthedocs.io/en/1.0.3/intro.html



In limitations it states: >While there is no hard limit on the size of the Tree, in most practical implementations it will need to be read entirely into main memory in order to interpret it, particularly to support forward references. This imposes a practical limit on its size relative to the system memory on the machine. It is not recommended to store large data sets in the tree directly, instead it should reference blocks.

I would guess that HDF5 would be the better choice for large datasets. However I quite do not understand the capital 'Tree' in this sentence and what that means for practical data sets.


Metadata needs that, e.g., h5ad solves? I think there's quite a bit to improve on for h5 (it's very slow), but h5ad adds great ways of managing indices and meta data.


Good thing someone already invented NetCDF to address the metadata needs too...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: