Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> As mentioned earlier, one of our top priorities is not breaking existing code or APIs

This is doomed then. Pandas API is already extremely bloated



This is one of my largest pet peeves with Pandas. There's like (or was) three APIs. Half the stuff on Stack Overflow or blogs is from 2013-2015 and deprecated. I feel like I have relearn Pandas every four years since Wes launched it almost a decade ago.


How so? Pandas is one of the most popular tools among folks doing data.

I admit that the API has issues (if/else? being the most glaring to me), notwithstanding Pandas has mass adoption because the benefits outweigh the warts.

(I happen to wish that 2.0 deprecated some of the API, but Python 3 burned a deep scar that many don't wish to relive.)


There's too many ways of doing the same thing (which I assume is already itself a relic of maintaining back-compatibility), there's inconsistencies within the API, there's "deprecated" stuff which isn't really deprecated, et cetera

  dataframe.column
vs

  dataframe['column']
as one example comes to mind but there is surely much more

I am of the philosophy of 'The Zen of Python'

  There should be one-- and preferably only one --obvious way to do it.

Pandas is a powerful library, but when I have to use it in a workplace it usually gives me a feeling of dread, knowing I am soon to face various hacks and dataframes full of NaNs without them being handled properly, etc.


Which column format would you prefer? You need the latter to address random loaded data which may contain illegal identifiers. You need the former to stifle the rumblings about Pandas verbosity.

I would get rid of the .column accessor, but you will see a lot of pushback. Notably from the R camp.


I would also get rid of the .column accessor, it has the potential to collide with function members of the DataFrame so shouldn't have been added in the first place IMO


Um nah. I can't imagine R follow being against .column disappearing forever. There's no equivalent in R


I mean that R users would insist it remains. The R $ accessor is terse and enables fast ad hoc analyses.


$ accessor does not exactly function like .column accessor. In fact, I think [''] accessor functions more like $ does.


The only reason is because it's the de facto default for python, not because of whatever cost-benefit analysis. You see the same thing with matplotlib and numpy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: