Pandas' type conversions irk me. I get why columns with NaN convert to float fro...

squaresmile · on Jan 25, 2020

You can use "Int64" for nullable integer data type.

https://pandas.pydata.org/pandas-docs/stable/user_guide/inte...

ianbutler · on Jan 25, 2020

Awesome super useful thank you.

abakker · on Jan 25, 2020

I mean, whenever I have this problem, I fall back on my (terrible) SPSS habits and just recode NaN to -999. With integer datasets that mostly works fine. You could come up with some alternative solution, too.

Personally, I'm an amateur at best, but Pandas has made it possible for me to do some really handy things.

Personal favorite feature: multindexing on rows AND Columns. Multiindex is such a common pattern, and so poorly handled in things like excel. Pandas really saves a lot of time with re-indexing, pivoting, stacking, or transposing data with multi indexes.

qwhelan · on Jan 25, 2020

You should upgrade your version of pandas if possible - that's been fixed for a few versions now.

ianbutler · on Jan 25, 2020

Oh interesting I'll have to check, we did an upgrade pass a while ago maybe we just didn't upgrade pandas for whatever reason.

qwhelan · on Jan 25, 2020

As mentioned elsewhere in this thread, it's opt-in to avoid breaking existing behavior. But given that ingestion points are easy to identify, it's pretty straightforward to turn on (especially if you have a schema for your inputs): https://pandas.pydata.org/pandas-docs/stable/user_guide/inte...

longemen3000 · on Jan 25, 2020

I saw in implementation (CSV parser in Julia) were the sentinel value was randomly assigned at read time (if a value in the input was equal to the sentinel value, change randomly).after parsing, the sentinel value would be converted to the appropriate data type (Julia Missing)

ianbutler · on Jan 25, 2020

That makes sense and thanks for the info and the link. It will be very useful going forward.