Polars can do a lot of useful processing while streaming a very large dataset wi...

davesque · on Feb 28, 2023

Not currently. But I imagine that, if Pandas does adopt Arrow in its next version, it should be able to do something like that through proper use of the Arrow API. Arrow is built with this kind of processing in mind and is continually adding more compute kernels that work this way when possible. The Dataset abstraction in Arrow allows for defining complex column "projections" that can execute in a single pass like this. Polars may be leveraging this functionality in Arrow.

nerdponx · on Feb 28, 2023

Only by writing your own routine to load and process one chunk at a time.