> Rust is one of those languages where I need to work to make the compiler happy...

phaylon · on Oct 1, 2014

I should've taken a deeper look at your CSV handling code earlier. That's quite a nice demonstration of an API using Decoder/Decodable.

moron4hire · on Oct 1, 2014

I don't understand. What more do you need out of CSV parsing that any standard regex library doesn't provide?

Ygg2 · on Oct 1, 2014

It's kinda funny, because burntsushi wrote Rust's standard regex library.

sorentwo · on Oct 1, 2014

There are plenty of values in a purpose built CSV parser. Several important features, depending on the CSVs you'll be working with are:

1. Sometimes you want to actually write files, in which case you'll want help escaping quoting.

2. Shortcuts for using header information to provide more convenient access into a particular column, rather than always doing it by index.

3. Conveniently slurp in only parts of a file, or only particular columns in a file, without loading everything into memory.

moron4hire · on Oct 1, 2014

All things a proper understanding of regex and a minimal understanding of streaming file IO can cover. The whole "if you think regex is the solution to your problem, now you have two problems" thing has gotten out of hand. Regex is not that hard.

Tyr42 · on Oct 1, 2014

It's simpler to handle quotes and backslashes escaping commas with a custom parser. And then there's the domain knowledge baked into the lib. Does your regex solution produce excel compatible csv files when you have leading zeros? That's important to some people.

newobj · on Oct 1, 2014

So, what is the regex to parse csv?

moron4hire · on Oct 1, 2014

I think people who are downvoting me don't understand how ludicrously retarded most people who output CSV are. CSV is not RFC4180. It's whatever bullshit text file your client has handed you and convinced your project manager is your problem to parse, not their problem to generate even remotely correctly. There is no CSV library capable of handling "CSV". Every time someone asks you for it, you better kick and scream or expect to do a custom job.

ghshephard · on Oct 1, 2014

I think people are being a fit unfair downvoting you right now (I bumped you up) - but I also disagree with you.

When I'm working with a file that purports to be CSV/TSV, with python, I reach out to the CSV module, specify the dialect that created it - and instantly get all the power of being able to identify refer to all the fields, rows without having to otherwise worry about parsing them.

Is it 100% bulletproof - definitely not - but, then again, I'm not writing a life safety system. And I've also never had the Python CSV parser break on any reasonable file I've sent it.

I'm truly thankful for robust CSV/TSV parsers. Throwaway code like this just works - in particular handles parsing the column headers to automatically build the dict for me.

   sitesFN=['gateway.tsv','relay.tsv']
   dsites={}
   for fn in sitesFN:
       f=open(fn,'r')
       reader=csv.DictReader(f,dialect='excel-tab')
       for row in reader:
           dsites[row['NIC_Serial_No']]=row['Device_Name']

Perhaps what you are trying to say that people are failing to hear, is that you can't rely on a CSV parser to, a priori, handle all possible files that purport to be "TSV/CSV" - in that I agree with you, and that you will always need to examine the file and determine if the built in parser will handle it.

But - What if it turns out the standard library CSV parser handles the "CSV" file just fine - in that case it seems to make a lot of sense to use it, rather than taking the time in writing your own (along with the bugs that come from re-writing anything).

And, speaking just for myself - again, I've never seen a CSV/TSV file that the python didn't handle just fine - not to say they aren't out there - you just have to go out of your way to create them.

burntsushi · on Oct 1, 2014

> And, speaking just for myself - again, I've never seen a CSV/TSV file that the python didn't handle just fine - not to say they aren't out there - you just have to go out of your way to create them.

Indeed. Python's CSV module supports a "strict" mode that will yell at you more often, but by default, it is disabled. When disabled, the parser will greatly prefer a parse over a correct parse. I took the same route with my CSV parser in Rust (with the intention of adding a strict mode later), because that's by far the most useful implementation. There's nothing more annoying then trying to slurp in a CSV file from somewhere and having your CSV library choke on it.

Too · on Oct 1, 2014

The csv library of python has handled every csv i've ever thrown on it, csv is "standardized" enough for that. Just set two things, the delimiter, the escape quote method and be done with it. The output is a list of dictionaries with the column headers as keys, very elegant. The best part is that using the same input you did to read a file you can use to save/modify the file and be sure it will look the same when your client re-opens it. A regexp would take twice the time to write and wouldn't give you half of those features, and it would probably fail at escaping sooner or later, for the same reason as xml can't be parsed with regexps.

tutuca · on Oct 2, 2014

While I agree with you, I'm having a lot of trouble handling unicode and windows' latin-1 encodings...

codexon · on Oct 1, 2014

This post sums up why.

http://programmers.stackexchange.com/a/215171

CSV is not just simple fields separated by commas with some quotes thrown in.

mercurial · on Oct 1, 2014

Well, if you look at, eg, Python's CSV parsing library, it's been more than enough to cover my needs so far, and handles different CSV flavours. It is much nicer and less error-prone to use than using regexps.

smcl · on Oct 1, 2014

You are getting downvoted to hell but I can see what you're meaning. Generally if someone hands you a CSV file there is no guarantee that something mental isn't happening as there is no "CSV" standard. So you're saying that when your task is "process the client's CSV file" you might not necessarily be able to rely on a library handling it correctly, and that you should prepare to get your hands dirty (perhaps hacking together something with a regex or two).

mcguire · on Oct 1, 2014

He actually has a point there. There are so many different versions of "CSV" floating around that I'm not at all sure I'd want to deal with a parser that could handle most of them. Ever generated a CSV file from a spreadsheet or DB interface program? Did it have a big list of options on how the CSV would be formatted, so you could easily read the generated file into whatever downstream you were using?

Yeah.

burntsushi · on Oct 1, 2014

> I'm not at all sure I'd want to deal with a parser that could handle most of them.

Python's CSV parser will handle almost anything you throw at it and it is widely used to great success.

> Ever generated a CSV file from a spreadsheet or DB interface program? Did it have a big list of options on how the CSV would be formatted, so you could easily read the generated file into whatever downstream you were using?

Just about every single CSV file that I've ever had to read was generated by someone other than me. Frequently (but not always), they come from a non-technical person.

Sometimes those CSV files even have NUL bytes in them. Yeah. Really. I swear. It's awful and Python's CSV parser fell over when trying to read them. (You can bet that my parser won't.)

> He actually has a point there.

His point is to use regexes instead of a proper CSV parser. I'm hard pressed to think of a reason to ever do such a thing:

1. A regex is much harder to get correct than using a standard CSV parser. 2. A regex will probably be slower than a fast CSV parser.

mcguire · on Oct 1, 2014

"Python's CSV parser will handle almost anything you throw at it and it is widely used to great success."

I like the word "almost". It's kept me in cheesy-puffs for years, now. :-)

I was actually only speaking to the post I replied to. CSV is a mess.

ghshephard · on Oct 1, 2014

A good CSV Parser (Python's) let's you specify the dialect of CSV file - it doesn't make assumptions as to the format of the CSV file.

cpeterso · on Oct 1, 2014

I think you just answered your own question. Parsing CSV is not a problem solved by "any standard regex library".

jxf · on Oct 1, 2014

A regexp would work, but it's usually the wrong level of abstraction to operate at. One wants to say "for the next 2,000 rows, retrieve columns 2 and 4, and the column labeled 'foo'", not write a regexp.

lilyball · on Oct 1, 2014

I don't understand; why would you try and parse CSV with a regex?

Note: CSV is actually much more complicated in practice than just splitting on commas.

moron4hire · on Oct 1, 2014

regex is much more capable than splitting on commas