We wrote a blog post detailing a lot of how we do our streaming ML system at Dis...

sandGorgon · on March 9, 2017

I never thought that ipython notebooks could be used to build production models.

Can you talk about this? How big a machine do you need to run notebooks effectively ? Why a notebook...Why not a Python script,etc

gallamine · on March 9, 2017

> How big a machine do you need to run notebooks effectively ?

- not too big. I've run tests with performance vs. sampling of data and for the models we use we can sample a significant amount (i.e. use a fraction of the full training data). - If we did need a bigger machine, we're using ensemble models that parallel train nicely.

> Why a notebook...Why not a Python script

I was originally going to use a Python script, but I found it useful to have the notebook output inline performance charts and metrics. It's easier to contain them in the notebook than output a bunch of image artifacts that have to be added to VC. This way I can pull open the notebook and scroll through to check all my visual metrics.

I'm not opposed to ditching the notebook for training entirely, but for now it works just fine.

sandGorgon · on March 10, 2017

so you build a notebook and play around with it... and then run this notebook in an automated way ? so u can open the notebook anytime and work with it ?

I really love this quick iterative way of working (atleast in the early days). Could you talk about your production setup of training ? I'm just concerned about performance, etc - is it OK to train manually each time (by opening the notebook, etc)

gallamine · on March 10, 2017

So far our models remain fairly stable in the ~weeks timeframe. If we needed to train daily or similar I would invest time in something other than a notebook. But, right now, it's easier to have the training steps documented in the notebook that does the actual training than to build a separate system and document it.

Not claiming it's a good way of doing this, but just how it is right now.

pigpigs · on March 9, 2017

Adding on to this, how do you guys version control your notebooks?

gallamine · on March 9, 2017

We still just check them in directly. Not great, but it works. You can at least have github render the notebook at any point.