I prefer commas because I can see them over tabs. I spend zero time escaping com...

mbreese · on May 4, 2022

I prefer tabs because my data can often have commas in it. Seeing tabs isn’t an issue as I also have invisible characters visible in all of my editors.

But having your delimiter not be allowed in the record (as in \t is disallowed), makes parsing so much easier. CSV is a bear to parse because you have to read each value from a buffer to handle the quoting.

    line.strip().split(‘\t’)

Is so handy. Batteries included are one thing, but I don’t want to pull in a library just to parse CSV files.

tech2 · on May 4, 2022

Given your example looks very much like python you could use the CSV module which is built into the standard library and handles all of this for you in a standards compliant manner.

You really do want to use a library to parse CSV since there are a number of corner cases. For example, your example code may not read a whole row since rows can have newlines in them.

mbreese · on May 4, 2022

> You really do want to use a library to parse CSV

Right… you absolutely need to use a library. I’ve written the parsers for CSV which handle the edge cases (at least RFC edge cases). But I don’t want to include a library to be able to do it. Most of my scripts are small, and adding libraries makes them more difficult to move around. So, I shy away from CSV as a format as a result.

I’m okay with not allowing new lines and tabs in my fields. It’s a valid trade off for my data.

Yes, many of my scripts are Python, but not all, so adding new libraries for different languages is more difficult for me to remember as opposed to just splitting a line by a record separator.

Also, many of my files require having comments in the file’s header. Another reason why CSV can’t be easily used, as this isn’t part of the RFC.

tech2 · on May 9, 2022

I appreciate, as described, some of your needs may differ from the standard format (I'd probably still use them to make interoperability between other l languages/people/systems easier though and have my comments etc. in documentation). However, with all that in mind I may not have communicated well enough about CSV as a built in.

> Most of my scripts are small, and adding libraries makes them more difficult to move around. So, I shy away from CSV as a format as a result.

The CSV library I speak of is part[0] of the python standard library. Unless you're working with a restricted subset of the language or something it should have no impact on the size of your script or its portability between platforms/locations.

[0] https://docs.python.org/3/library/csv.html

thayne · on May 4, 2022

Newlines are pretty common in text too, depending on your data. What do you do about those?