There is a decent file format for tabular data, and the author dismisses it: parquet.
It's compact, encodes all the common data types well, does int/float distinction (thanks for teaching us about how important that is json), stores null records with a mask instead of a special value, row major order, has compression, speedy random access... it has it all. And it isn't bogged down with legacy cruft (yet).
Since you need to look at tabular data outside of a text editor anyway[0], I don't see a ton of benefit to making it a plaintext format. Especially not with the author's suggestion of un-typeable ascii delimiters. If I can't type it on my keyboard, I may as well be looking at a hex dump of a binary file because I can't really edit it.
[0] Who among us hasn't experienced the joy of a pull request updating a checked in csv file? A mess of ,,,,,,,"Birmingham",,,AL, etc.
parquet is great but it's not particularly easy to read or write. the libraries that do exist to work with it are few and far between, and those that do either have a hundred dependencies or depend on native code (e.g. libarrow). certainly an important dimension in an ideal file format should be the ease of parsing/writing it, and parquet gets an extremely low score on that front imo
Parquet is also column-major which is great for many use cases, but bad for others, where row-major is more useful. For example, if you want to get just the first x rows.
Sure, but any new format is going to have the same problems. I think you're right that implementation complexity needs to be considered, but it's not like Word or Excel files or something where you need to replicate bug for bug a format accreted over decades.
Parquet isn't trivial to parse / write but that's probably good imo. CSV is really easy to write, and... that just means everybody does it slightly differently. Being somewhat difficult to interact with encourages people to use a library to centralize a bit, but it's not so complex that someone motivated couldn't write a new implementation in a reasonable amount of time.
It's compact, encodes all the common data types well, does int/float distinction (thanks for teaching us about how important that is json), stores null records with a mask instead of a special value, row major order, has compression, speedy random access... it has it all. And it isn't bogged down with legacy cruft (yet).
Since you need to look at tabular data outside of a text editor anyway[0], I don't see a ton of benefit to making it a plaintext format. Especially not with the author's suggestion of un-typeable ascii delimiters. If I can't type it on my keyboard, I may as well be looking at a hex dump of a binary file because I can't really edit it.
[0] Who among us hasn't experienced the joy of a pull request updating a checked in csv file? A mess of ,,,,,,,"Birmingham",,,AL, etc.