Hacker News new | past | comments | ask | show | jobs | submit login
Deep learning (neuralnetworksanddeeplearning.com)
361 points by joeyespo on July 26, 2015 | hide | past | favorite | 35 comments



I was really impressed that the author included this caveat:

> A word on procedure: In this section, we've smoothly moved from single hidden-layer shallow networks to many-layer convolutional networks. It's all seemed so easy! We make a change and, for the most part, we get an improvement. If you start experimenting, I can guarantee things won't always be so smooth. The reason is that I've presented a cleaned-up narrative, omitting many experiments - including many failed experiments. This cleaned-up narrative will hopefully help you get clear on the basic ideas. But it also runs the risk of conveying an incomplete impression. Getting a good, working network can involve a lot of trial and error, and occasional frustration. In practice, you should expect to engage in quite a bit of experimentation.

There is a lot of "magical thinking" amongst people not actively doing research in the area (and maybe a bit within that community too), and I think it at least partly stems from mainly seeing very successful nets, and never seeing the many failed ideas before those network structures and hyperparameters were hit upon - a sampling bias type thing, where you only read about the things that work.


Yes, difficulty of finding right hyperparameters is often overlooked. And it is a very frustrating part of creating a model. And methods like grid search just don't work, because of number of parameters to tune and time to train a network.


Actually, random search works a lot better than grid search for hyperparameter optimization. Usually, only a small number of hyperparameters actually matter, the trick is figuring out which ones. Grid search wastes time on irrelevant dimensions.

That said, any sort of hyperparameter optimization is extremely computationally intensive so random search is far from a panacea.


So when you search randomly and reach up to a set of optimised parameters, how do you know if it can't be optimised any further, since you haven't looked up all possible sets like in a grid?


You generally don't know if you've reached a suitable maxima, which is why it is good to run a nondeterministic optimizer a few times (if computation power allows) and see if there are any reliable parameters form there.

There are also somewhat better-than-random strategies such as Bayesian optimization and particle swarm optimization that can help you to search more efficiently.


Grid search never exhausts the search space either, at least if the dimensions are continuous.


As a different chapter, this is not exactly a dupe, but it's not the first time links to parts of this book have been posted. Over the last two years, there have been a lot of HN discussions on the various chapters of this book. Here are the ones with comments:

16 days ago - https://news.ycombinator.com/item?id=9863832

8 months ago - https://news.ycombinator.com/item?id=8719371

a year ago - https://news.ycombinator.com/item?id=8258652

a year ago - https://news.ycombinator.com/item?id=8120670

a year ago - https://news.ycombinator.com/item?id=7920183

a year ago - https://news.ycombinator.com/item?id=7588158

two years ago - https://news.ycombinator.com/item?id=6794308


How did you do that? Genuinely curious.



Is there a similar search engine for reddit? I can't access my old reddit posts by search because reddit has a cutoff point at 1000 results.


How about "site:reddit.com visarga"?


It's worth reading Nielsen's essay "Will neural networks and deep learning soon lead to artificial intelligence?" which was added today

http://neuralnetworksanddeeplearning.com/chap6.html#AI


And his answer is:

I believe that we are several decades (at least) from using deep learning to develop general AI.


I think this is a better summary of his conclusions from that same paragraph:

I conclude that, even rather optimistically, it's going to take many, many deep ideas to build an AI.

The appendix linked there doesn't seem to be ready yet though. In any case, I like how this is phrased. I'd like to see some of the hype around deep learning calm down.



I've sort of run adjacent to the field of machine learning in the last few years, but haven't really dove in to the existing literature. This seems to be a pretty interesting overview.

Out of curiosity, do many implementations of convolutional neural networks take advantage of FFT, DCT, or some other fast orthonormal transform to compute the transition between layers, or are the kernel sizes small enough that there isn't a great advantage to that?


Facebook actually does something like that: https://research.facebook.com/blog/879898285375829/fair-open...

They have a patent on it but did open sourced the code. They claim it's up to 24x faster than standard. But that is only true for an extreme use case, its only 2x faster on average.


The bigger the convolution, the faster it gets because a convolution in real space is a multiplication in frequency space.

It breaks even at 5x5 or so and gets dramatically better shortly thereafter. However, most of the convolutional nets in use rely on 3x3 convolutions because I guess reasons:

http://arxiv.org/pdf/1409.1556.pdf (all 3x3)

http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf (3x3 and 5x5)

There's probably a new Imagenet winner in this somewhere IMO...


One thing I wonder about is whether it is possible to somehow reflect symmetry in the input data in the structure of the neural network?

For example, the usual way to have a DNN learn rotation/scaling/translation is to do data augmentation and simply learn with all the data rotated/translated/shifted.

But there must be a way to have these input space symmetries reflect somehow in the structure of the network?

I tried googling this a bit but wasn't really successful - does anyone know whether this has been done?


So, convolution is by itself an attempt to exploit translation-invariance in the visual world, and typical deep convnets end up picking up a certain amount of scaling tolerance (though I would not call it invariance) by having features that are sensitive to larger and larger patches of the input as you go up the hierarchy of features. This is not real scale-invariance, and many people run a laplacian pyramid of some sort at test time to get it real scale-invariance when eking out the best possible numbers.

Rotation-invariance is probably not really a thing you want. The visual world is not, in fact, rotation-invariant, and the "up" direction on Earth-bound, naturally-occurring images has different statistics than the "down" direction, and you'd like to exploit these. Animal visual systems are not rotation-invariant either; an entertainingly powerful demo of this is "the Thatcher Effect" (https://en.wikipedia.org/wiki/Thatcher_effect).

Reflection across a vertical axis, on the other hand, often is exploitable, at least in image recognition contexts (as opposed to, say, handwriting recognition). If you look at the features image recognition convnets are learning they are often symmetric around some axis or other, or sometimes come in "pairs" of left-hand/right-hand twins. As far as I know nobody has tried to exploit this architecturally in any way other than just data augmentation, but it's a big world out there and people have been trying this stuff for a long time.


I was thinking more about a machine vision context with e.g. different parts coming in at any rotation angle.

I know that some translation invariance comes from e.g. the usual conv+maxpool layer structure, but there must still be several representations existing in the first hidden layer of the network stack, for the different translation shifts?

Especially rotation looks like something that should produce a lot of symmetry and shared parameters, but it also looks difficult enough for me that I rather would like to know about someone with mad math/group theory(?) skills who looked at that.

But thank you for the detailed reply anyways!


The usual way to do Translation Invariance is with the structure of the network resulting in what's called a convolutional neural network (conv+pool actually achieves this).

There have been papers about scale/rotation invariant convnets (again at the structure level) and also Networks that learn invariances without encoding them into the structure.


> There have been papers about scale/rotation invariant convnets (again at the structure level) and also Networks that learn invariances without encoding them into the structure.

The former I am very interested in! Do you have any links?


CNNs aren't rotation invariant on purpose. If they were, you would lose information about how features are oriented. Typically it learns features that are very sensitive to rotation, like vertical edges and horizontal edges.


I am rather thinking about an on/off 'object detected 'for objects with any rotation angle. That symmetry surely must be exploitable somehow in shared parameters of the DNN or similar?

My gut feeling is that the first convolutional layer's kernels, for example, would probably have a 'some are orthogonal' constraint due to this symmetry.


I don't really understand it but I suspect there is some trick going on somewhere. I don't see how you can avoid doing at least one computation for each weight pixel pair in each convolution.



I read it, I just don't understand it.


Most use an im2col/col2im based impl. I've been experimenting with cuda based fft for ours as well though.


This is the best neural network tutorial out there. I have been waiting for the missing deep learning chapter for so long, and it finally comes!

my reading for today, thanks for sharing!


I just wish the code was in Ruby... but having the author released all this material free I don't feel like actually complaining :) it's more of a subtle hint than anything else.. ;)


Read the material, skip the code, write your own code in Ruby, that's what I've been doing in Clojure.


That would be a highly effective way of learning but not everybody has the time for it... and also it's not just the ready made code that I envy, but also the added value of someone having already picked the relevant, working libraries, tools and so on..


Thank you for this, even if it might be a repost, I had missed it, and it's very, very interesting.


Not a repost; apparently this chapter was just released.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: