Because gradient descent never gets stuck in a suboptimal solution?

gHosts · on Oct 3, 2019

If I understand what they are looking at, they are looking at problems that are simple in the sense of having no local optima, but complex in the sense of having very high dimension.

ie. It's not a problem for the class of problems they are optimizing.

I've been out of that domain for awhile... but when I did a lot of optimizing / fitting in a high dimensional space with a function that had lots of local optima... I found the downhill simplex method with simulated annealing was most effective / robust.

dleary · on Oct 3, 2019

There is a problem with our intuition about gradient descent that blinds us, a little bit.

For a local minimum to exist, that means that the function we're looking at has to have a local minimum when sliced in every dimension.

In our intuition, this is relatively common, because we usually imagine 2d manifolds embedded in 3-space (ie, "hill climbing"). And when there's only 2 dimensions, it's not hard to have a "bowl", where the function curves upward in both dimensions.

In fact, a "saddle point", where one of the dimensions curves up and the other curves down, feels "unlikely" to most peoples' intuition, because we don't often run into physical objects with that shape.

But, saddle points in higher dimensions "should be" far more common than local minima. You only need one dimension to curve downward to have a "hole" where the "water" can escape and continue flowing downhill.

Gradient descent easily gets stuck in 2 dimensions, and so we are often surprised by how well it performs in higher dimensions.

wnoise · on Oct 3, 2019

Yes, saddle points are far more common than local minima in high dimension. Unfortunately they're really good at slowing down naive gradient descent...

naniwaduni · on Oct 3, 2019

which is great as long as you choose your problems based on what your tools are good at

dodobirdlord · on Oct 3, 2019

Local minima basically don't exist in high dimensional spaces, so in practice the tools work for a wide variety of problems.

gHosts · on Oct 8, 2019

Definitely not true. Not even close.

dodobirdlord · on Oct 10, 2019

Not much of a response. Care to expand here? This is a well-understood principle in the existing body of research on convergence in high dimensional spaces.

sp332 · on Oct 3, 2019

This has been very lucrative for computers so far.

adyavanapalli · on Oct 3, 2019

I'm sorry that I can't find the reference for this right now, but I read a paper a while back showing that (particularly stochastic) gradient descent usually finds good solutions (and that these solutions are very close to each other), although not absolute extrema, to highly dimensional neural networks. Whats more, you usually want to avoid finding absolute extrema solutions because they often suffer from overfitting behaviors.

SheinhardtWigCo · on Oct 2, 2019

Don’t you, too, sometimes?

monocasa · on Oct 2, 2019

Depends on my strategy. For instance a superoptimizer tries all possible options and selects the best one.

myrryr · on Oct 3, 2019

of course picking to use a superoptimizer itself may not be optimal, and so it goes :)

I think Don’t you, too, sometimes? As a response to that has a lot more going for it than it first looks like.

I mean, I look at it, and the choices I make in my life, and think, yeah.... I really do.