Hacker News new | past | comments | ask | show | jobs | submit login

At a previous job, I regularly worked with dfs of millions to hundreds of millions of rows, and many columns. It was not uncommon for the objects I was working with to use 100+ GB ram. I coded initially in Python, but moved to Julia when the performance issues became to painful (10+ minute operations in Python that took < 10s in Julia).

DataFrames.jl, DataFramesMeta.jl, and the rest of the ecosystem are outstanding. Very similar to pandas, and much ... much faster. If you are dealing with small (obviously subjective as to the definition of small) dfs of around 1000-10000 rows, sticking with pandas and python is fine. If you are dealing with large amounts of real world time series data, with missing values, with a need for data cleanup as well as analytics, it is very hard to beat Julia.

FWIW, I'm amazed by DuckDB, and have played with it. The DuckDB Julia connector gives you the best of both worlds. I don't need DuckDB at the moment (though I can see this changing), and use Julia for my large scale analytics. Python's regex support is fairly crappy, so my data extraction is done using Perl. Python is left for small scripts that don't need to process lots of information, and can fit within a single terminal window (due to its semantic space handicap).




That's a nice endorsement, I've always liked the idea of Julia as an R replacement. I'll definitely give that a shot when I have a chance.

Is there any kind of decent support for plotting with data frames? Or does Plots.jl work with it out of the box?


Plots.jl can work. You may also want to try https://makie.org/ and https://github.com/TidierOrg/TidierPlots.jl




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: