> How big a machine do you need to run notebooks effectively ?
- not too big. I've run tests with performance vs. sampling of data and for the models we use we can sample a significant amount (i.e. use a fraction of the full training data).
- If we did need a bigger machine, we're using ensemble models that parallel train nicely.
> Why a notebook...Why not a Python script
I was originally going to use a Python script, but I found it useful to have the notebook output inline performance charts and metrics. It's easier to contain them in the notebook than output a bunch of image artifacts that have to be added to VC. This way I can pull open the notebook and scroll through to check all my visual metrics.
I'm not opposed to ditching the notebook for training entirely, but for now it works just fine.
so you build a notebook and play around with it... and then run this notebook in an automated way ? so u can open the notebook anytime and work with it ?
I really love this quick iterative way of working (atleast in the early days). Could you talk about your production setup of training ? I'm just concerned about performance, etc - is it OK to train manually each time (by opening the notebook, etc)
So far our models remain fairly stable in the ~weeks timeframe. If we needed to train daily or similar I would invest time in something other than a notebook. But, right now, it's easier to have the training steps documented in the notebook that does the actual training than to build a separate system and document it.
Not claiming it's a good way of doing this, but just how it is right now.
- not too big. I've run tests with performance vs. sampling of data and for the models we use we can sample a significant amount (i.e. use a fraction of the full training data). - If we did need a bigger machine, we're using ensemble models that parallel train nicely.
> Why a notebook...Why not a Python script
I was originally going to use a Python script, but I found it useful to have the notebook output inline performance charts and metrics. It's easier to contain them in the notebook than output a bunch of image artifacts that have to be added to VC. This way I can pull open the notebook and scroll through to check all my visual metrics.
I'm not opposed to ditching the notebook for training entirely, but for now it works just fine.