Survey of Dropout Methods for Deep Neural Networks

m3kw9 · on May 2, 2019

When in doubt, use random dropouts

anthony_doan · on May 2, 2019

Is drop out still empirical or are there any proof of why it works in the overall model?

I recall reading up on CNN and playing around with it and it was interesting to add random drop off in there but was never explained why it works. I think the general thinking of why it works is that the network is overfitting so randomly dropping node is required for generalization?

jimmy_dean · on May 2, 2019

Addressing your second question. Informally, dropping nodes fights overfitting by creating subsample architectures of which are essentially thinned out networks of the one you've designed. Having trained on these sub nets means you've effectively combined the learning of a few different models and in doing so have generalized beyond the capabilities of your original "single" architecture.

bumby · on May 7, 2019

My understanding is that it avoids overfitting when data points are highly correlated.

For example, if you use image augmentation to generate additional data, your augmented images are going to be highly correlated to their parent image leading to overfitting of the data. By using random dropout, this overfitting can be somewhat mitigated.

snrji · on May 2, 2019

For linear models it can be shown to be equivalent to weight decay. For nonlinear ones, it empirically behaves as a regularizer.,