Deep learning

idunning · on July 26, 2015

I was really impressed that the author included this caveat:

> A word on procedure: In this section, we've smoothly moved from single hidden-layer shallow networks to many-layer convolutional networks. It's all seemed so easy! We make a change and, for the most part, we get an improvement. If you start experimenting, I can guarantee things won't always be so smooth. The reason is that I've presented a cleaned-up narrative, omitting many experiments - including many failed experiments. This cleaned-up narrative will hopefully help you get clear on the basic ideas. But it also runs the risk of conveying an incomplete impression. Getting a good, working network can involve a lot of trial and error, and occasional frustration. In practice, you should expect to engage in quite a bit of experimentation.

There is a lot of "magical thinking" amongst people not actively doing research in the area (and maybe a bit within that community too), and I think it at least partly stems from mainly seeing very successful nets, and never seeing the many failed ideas before those network structures and hyperparameters were hit upon - a sampling bias type thing, where you only read about the things that work.

jacek · on July 26, 2015

Yes, difficulty of finding right hyperparameters is often overlooked. And it is a very frustrating part of creating a model. And methods like grid search just don't work, because of number of parameters to tune and time to train a network.

DavidSJ · on July 27, 2015

Actually, random search works a lot better than grid search for hyperparameter optimization. Usually, only a small number of hyperparameters actually matter, the trick is figuring out which ones. Grid search wastes time on irrelevant dimensions.

That said, any sort of hyperparameter optimization is extremely computationally intensive so random search is far from a panacea.

anantzoid · on July 27, 2015

So when you search randomly and reach up to a set of optimised parameters, how do you know if it can't be optimised any further, since you haven't looked up all possible sets like in a grid?

shazeline · on July 27, 2015

You generally don't know if you've reached a suitable maxima, which is why it is good to run a nondeterministic optimizer a few times (if computation power allows) and see if there are any reliable parameters form there.

There are also somewhat better-than-random strategies such as Bayesian optimization and particle swarm optimization that can help you to search more efficiently.

DavidSJ · on July 27, 2015

Grid search never exhausts the search space either, at least if the dimensions are continuous.

jcr · on July 26, 2015

As a different chapter, this is not exactly a dupe, but it's not the first time links to parts of this book have been posted. Over the last two years, there have been a lot of HN discussions on the various chapters of this book. Here are the ones with comments:

16 days ago - https://news.ycombinator.com/item?id=9863832

8 months ago - https://news.ycombinator.com/item?id=8719371

a year ago - https://news.ycombinator.com/item?id=8258652

a year ago - https://news.ycombinator.com/item?id=8120670

a year ago - https://news.ycombinator.com/item?id=7920183

a year ago - https://news.ycombinator.com/item?id=7588158

two years ago - https://news.ycombinator.com/item?id=6794308

zo1 · on July 26, 2015

How did you do that? Genuinely curious.

film42 · on July 26, 2015

HN search: https://hn.algolia.com/?query=neuralnetworksanddeeplearning....

visarga · on July 27, 2015

Is there a similar search engine for reddit? I can't access my old reddit posts by search because reddit has a cutoff point at 1000 results.

mryan · on July 27, 2015

How about "site:reddit.com visarga"?

return0 · on July 26, 2015

It's worth reading Nielsen's essay "Will neural networks and deep learning soon lead to artificial intelligence?" which was added today

http://neuralnetworksanddeeplearning.com/chap6.html#AI

sampo · on July 27, 2015

And his answer is:

I believe that we are several decades (at least) from using deep learning to develop general AI.

goodness · on July 27, 2015

I think this is a better summary of his conclusions from that same paragraph:

I conclude that, even rather optimistically, it's going to take many, many deep ideas to build an AI.

The appendix linked there doesn't seem to be ready yet though. In any case, I like how this is phrased. I'd like to see some of the hype around deep learning calm down.

michael_nielsen · on July 27, 2015

The appendix is at http://neuralnetworksanddeeplearning.com/sai.html

thearn4 · on July 26, 2015

I've sort of run adjacent to the field of machine learning in the last few years, but haven't really dove in to the existing literature. This seems to be a pretty interesting overview.

Out of curiosity, do many implementations of convolutional neural networks take advantage of FFT, DCT, or some other fast orthonormal transform to compute the transition between layers, or are the kernel sizes small enough that there isn't a great advantage to that?

Houshalter · on July 26, 2015

Facebook actually does something like that: https://research.facebook.com/blog/879898285375829/fair-open...

They have a patent on it but did open sourced the code. They claim it's up to 24x faster than standard. But that is only true for an extreme use case, its only 2x faster on average.

varelse · on July 26, 2015

The bigger the convolution, the faster it gets because a convolution in real space is a multiplication in frequency space.

It breaks even at 5x5 or so and gets dramatically better shortly thereafter. However, most of the convolutional nets in use rely on 3x3 convolutions because I guess reasons:

http://arxiv.org/pdf/1409.1556.pdf (all 3x3)

http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf (3x3 and 5x5)

There's probably a new Imagenet winner in this somewhere IMO...

onnoonno · on July 26, 2015

One thing I wonder about is whether it is possible to somehow reflect symmetry in the input data in the structure of the neural network?

For example, the usual way to have a DNN learn rotation/scaling/translation is to do data augmentation and simply learn with all the data rotated/translated/shifted.

But there must be a way to have these input space symmetries reflect somehow in the structure of the network?

I tried googling this a bit but wasn't really successful - does anyone know whether this has been done?

kmavm · on July 27, 2015

So, convolution is by itself an attempt to exploit translation-invariance in the visual world, and typical deep convnets end up picking up a certain amount of scaling tolerance (though I would not call it invariance) by having features that are sensitive to larger and larger patches of the input as you go up the hierarchy of features. This is not real scale-invariance, and many people run a laplacian pyramid of some sort at test time to get it real scale-invariance when eking out the best possible numbers.

Rotation-invariance is probably not really a thing you want. The visual world is not, in fact, rotation-invariant, and the "up" direction on Earth-bound, naturally-occurring images has different statistics than the "down" direction, and you'd like to exploit these. Animal visual systems are not rotation-invariant either; an entertainingly powerful demo of this is "the Thatcher Effect" (https://en.wikipedia.org/wiki/Thatcher_effect).

Reflection across a vertical axis, on the other hand, often is exploitable, at least in image recognition contexts (as opposed to, say, handwriting recognition). If you look at the features image recognition convnets are learning they are often symmetric around some axis or other, or sometimes come in "pairs" of left-hand/right-hand twins. As far as I know nobody has tried to exploit this architecturally in any way other than just data augmentation, but it's a big world out there and people have been trying this stuff for a long time.

onnoonno · on July 27, 2015

I was thinking more about a machine vision context with e.g. different parts coming in at any rotation angle.

I know that some translation invariance comes from e.g. the usual conv+maxpool layer structure, but there must still be several representations existing in the first hidden layer of the network stack, for the different translation shifts?

Especially rotation looks like something that should produce a lot of symmetry and shared parameters, but it also looks difficult enough for me that I rather would like to know about someone with mad math/group theory(?) skills who looked at that.

But thank you for the detailed reply anyways!

lrei · on July 27, 2015

The usual way to do Translation Invariance is with the structure of the network resulting in what's called a convolutional neural network (conv+pool actually achieves this).

There have been papers about scale/rotation invariant convnets (again at the structure level) and also Networks that learn invariances without encoding them into the structure.

onnoonno · on July 27, 2015

> There have been papers about scale/rotation invariant convnets (again at the structure level) and also Networks that learn invariances without encoding them into the structure.

The former I am very interested in! Do you have any links?

Houshalter · on July 27, 2015

CNNs aren't rotation invariant on purpose. If they were, you would lose information about how features are oriented. Typically it learns features that are very sensitive to rotation, like vertical edges and horizontal edges.

onnoonno · on July 27, 2015

I am rather thinking about an on/off 'object detected 'for objects with any rotation angle. That symmetry surely must be exploitable somehow in shared parameters of the DNN or similar?

My gut feeling is that the first convolutional layer's kernels, for example, would probably have a 'some are orthogonal' constraint due to this symmetry.

Houshalter · on July 27, 2015

I don't really understand it but I suspect there is some trick going on somewhere. I don't see how you can avoid doing at least one computation for each weight pixel pair in each convolution.

varelse · on July 27, 2015

Does this answer your question?

https://en.wikipedia.org/wiki/Convolution_theorem

Houshalter · on July 29, 2015

I read it, I just don't understand it.

agibsonccc · on July 26, 2015

Most use an im2col/col2im based impl. I've been experimenting with cuda based fft for ours as well though.

billconan · on July 26, 2015

This is the best neural network tutorial out there. I have been waiting for the missing deep learning chapter for so long, and it finally comes!

my reading for today, thanks for sharing!

MrBra · on July 27, 2015

I just wish the code was in Ruby... but having the author released all this material free I don't feel like actually complaining :) it's more of a subtle hint than anything else.. ;)

MBlume · on July 27, 2015

Read the material, skip the code, write your own code in Ruby, that's what I've been doing in Clojure.

MrBra · on July 27, 2015

That would be a highly effective way of learning but not everybody has the time for it... and also it's not just the ready made code that I envy, but also the added value of someone having already picked the relevant, working libraries, tools and so on..

buserror · on July 26, 2015

Thank you for this, even if it might be a repost, I had missed it, and it's very, very interesting.

dang · on July 27, 2015

Not a repost; apparently this chapter was just released.