Pandas' type conversions irk me. I get why columns with NaN convert to float from integer but very rarely do I have data that is complete for every column and converting columns that were intentionally integer has caused headaches when that data then goes to other systems such as a sql db.
It is currently sitting at the center of an ETL system at my work (not my decision) and causes headaches.
I mean, whenever I have this problem, I fall back on my (terrible) SPSS habits and just recode NaN to -999. With integer datasets that mostly works fine. You could come up with some alternative solution, too.
Personally, I'm an amateur at best, but Pandas has made it possible for me to do some really handy things.
Personal favorite feature: multindexing on rows AND Columns. Multiindex is such a common pattern, and so poorly handled in things like excel. Pandas really saves a lot of time with re-indexing, pivoting, stacking, or transposing data with multi indexes.
As mentioned elsewhere in this thread, it's opt-in to avoid breaking existing behavior. But given that ingestion points are easy to identify, it's pretty straightforward to turn on (especially if you have a schema for your inputs): https://pandas.pydata.org/pandas-docs/stable/user_guide/inte...
I saw in implementation (CSV parser in Julia) were the sentinel value was randomly assigned at read time (if a value in the input was equal to the sentinel value, change randomly).after parsing, the sentinel value would be converted to the appropriate data type (Julia Missing)
It is currently sitting at the center of an ETL system at my work (not my decision) and causes headaches.