When you've "productionized" a part of your notebook into a Python module, refac...

gusgordon · on May 23, 2022

Yeah, that’s basically what I do, but I often find I need to play around with intermediary data within functions.

apwheele · on May 23, 2022

I create my own classes for this. (Essentially to do the same thing as sklearn pipelines, but I like creating my own classes just for this debugging/slowly expand functionality reason.) Something like:

class mymodel(): ... def feature_engineering(self): ... def impute_missing(self): ... def fit(self) ... def predict_proba(self) ...

Then it is pretty trivial to test with new data. And you can parameterize the things you want, e.g. init fit method as random forest or xgboost, or stack on different feature engineering, etc. And for debugging you can extract/step through the individual methods.

rmnclmnt · on May 23, 2022

This is a blind guess here, but if you need to inspect the inner data of your function after writing it, it might mean the scope is too broad and your function could be split?

This is where standard SWE guidelines could be of help (function interfaces, contracts definition, etc)