Yes, I agree with your assessment that technologies that can deal with larger th...

Yes, I agree with your assessment that technologies that can deal with larger than memory datasets (e.g. Polars) can be used to filter data so there are less rows for technologies that can only handle datasets a fraction of the data (e.g. pandas).

Another way to solve this problem is using a Lakehouse storage format like Delta Lake so you only read in a fraction of the data to the pandas DataFrame (disclosure: I am on the Delta Lake team). I've blogged about this and think predicate pushdown filtering / Z ORDERING data is more straightforward that adding an entire new tech like Polars / DuckDB to the stack.

If you're using Polars of course, it's probably best to just keep on using it rather than switching to pandas. I suppose there are some instances when you need to switch to pandas (perhaps to access a library), but think it'll be better to just stick with the more efficient tech normally.