Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm looking forward to trying this, learned the hard way that Excel will make unwanted changes without notice to CSV files. Like SSNs that start with zero having there leading zeros silently removed, breaking things.


Yeah excel silently mangling csv files has always infuriated me

Just gave this program a try and it lets me save changes without removing any of the double quotes, yay!


I'm convinced that 90% of the people who rail against CSV for "not being a real format" are the ones that got burned by Excel. It has always shocked me how terrible Excel is at handling CSV given the sheer number of people who have this use case.

The crazy part is that it wouldn't be all that hard to handle it properly. Excel could examine each column and apply a uniform transform on each column instead of applying transforms on a cell by cell basis. They could even put in real effort and let the user choose the format for each column as part of the import process. You know, like being able to specify "text" for columns like SSNs or Credit cards that you aren't going to do math on anyway.


And dates! And ISSNs (I work with library data)! Ugh.


Optimist: the glass is 1/2 full

Pessimist: the glass is 1/2 empty

Excel: the glass is 1900-02-01

---

We've had bugs raised against our software because the clients use meeting titles like 1-2-1 for supervisory reviews which when extracted for reporting and opened in Excel (they love to dump data into Excel no matter what report functions you include directly in the application) get interpreted as a date even when output as a quoted string in the CSV file. Try explaining to a client that we have formatted it correctly and Excel is reading it wrong…

(We could of course output Excel files directly, but some of them can't download office documents from web apps because of security policy at their end.)

Given how badly common tools mangle unambiguously correct CSV data, how many variations there are which make “unambiguously correct CSV data” a somewhat small proportion of what is out there, and how many tools not only expect but require mis-formatted data and/or output it, it is scary how much the format is relied upon in major industries.


"Given how badly common tools mangle unambiguously correct CSV data, how many variations there are which make “unambiguously correct CSV data” a somewhat small proportion of what is out there, and how many tools not only expect but require mis-formatted data and/or output it, it is scary how much the format is relied upon in major industries."

In a nutshell, CSV isn't a format. It's a family of formats, and it's not even a well-specified family of formats.

At least in semi-technical circles, I've had some success in using this to push back against CSV suggestions and get them to use better things. I'm sure that in non-technical circles I'd have zero success with this, though. It sure ain't a magic talisman you can use.

JSON isn't exactly a rigidly specified format, but it's got a lot less flex in it and I've not had as much trouble with it. Biggest problem I have is just getting people using dynamic scripting languages to please output either a string or a number, but don't just output "whatever the scripting language happened to decide based on what code paths I happened to run" when you don't even realize your code ends up casting it back and forth without you knowing and what comes out is effectively random from my point of view.


> In a nutshell, CSV isn't a format. It's a family of formats, and it's not even a well-specified family of formats.

There is RFC4180. Though by 2005 when that came about there were already so many different cases around that it became just one of a great many possible variants.

I try not to push back too hard about CSV, for fear of “well, there is this XML format that is supported”! (bad enough in itself, but sometimes the “XML format” is even more poorly specified than the client's CSV edge cases which we are expected to guess).

JSON is nice as long, as you say, that strings are real strings and numbers are real numbers.

Oh, and dates/times are in an RFC3339 (or ISO8601) numeric (no localised month names, etc.) format either in UTC or with the timezone always specified, as strings (though at a pinch I'll accept a posix time_t for datetime if based on UTC). Not specifying how to handle dates/times/both is the major problem with JSON in my experience.


Bug-for-bug compatible is a direct consequence of Postel's law: https://en.wikipedia.org/wiki/Robustness_principle


And credit card numbers.

string -> interpreted as a number -> displayed in scientific notation -> saved to disk with the last 3 or 4 digits zeroed




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: