I first read about simulated annealing more than thirty years ago.
It's a fascinating approach. The analogy to physics is especially fascinating. I have only kept up with the literature, not seriously implemented it, so I could be wrong. But what I understand is that while simulated annealing is good for many things, it hasn't shown itself to be best for anything and the improvements on it have tended to changes that weakened the analogy with physics [1]. I find this disappointing since in its raw form, simulated annealing suggested a sort of "physics of information processing" mapping hard computation problems to states of matter. But it seems like analogies may be as misleading as they are productive sometimes.
> But what I understand is that while simulated annealing is good for many things, it hasn't shown itself to be best for anything
That's kind of the nature of the beast, though. Approaches like Simulated Annealing and Genetic Algorithms are appropriate for situations in which you have no good heuristic for pruning a search tree. They're almost always going to be last resort approaches, but at least in the case of SA they generally approximate quickly enough to provide useful results.
To explain simulated annealing to lay audiences, I rely on a similar (inverted) example. You're at the top of the mountain. You want to find the lowest spot. If you just keep walking downhill only, you may reach the sea eventually. But you may never reach Death Valley (86m below sea level) unless you are willing to climb some mountains at the beginning of your trip. IOW you need to accept sub-optimal moves (with decreasing probability) in order to adequately explore your surroundings.
Fewer pretty pictures, but much funnier and describing several different optimization algorithms (in the context of neural network training, but most of it doesn't depend on that): "Kangaroos and training neural networks".
unless the global maximum is much higher, how much can be
benefit from moving from the local max ?
in some way shape or form, I hear argument over and over - and from intelligent people.
We see this kind of risk-averse behavior in social situations where nobody wants to take the "hit" of moving from their current strategy.
or there many have been a time, when we identified the maximum and the entire landscape has changed around them, so the strategy is no longer optimal. this is another real situation.
also, "simulated annealing" seems to be a bit of a misnomer for this type of mixed strategy.
It's a fascinating approach. The analogy to physics is especially fascinating. I have only kept up with the literature, not seriously implemented it, so I could be wrong. But what I understand is that while simulated annealing is good for many things, it hasn't shown itself to be best for anything and the improvements on it have tended to changes that weakened the analogy with physics [1]. I find this disappointing since in its raw form, simulated annealing suggested a sort of "physics of information processing" mapping hard computation problems to states of matter. But it seems like analogies may be as misleading as they are productive sometimes.
[1] for example traveling salesman, http://en.wikipedia.org/wiki/Travelling_salesman_problem, the first problem I saw simulated annealing applied to [in a reprint someone tossed out in Evans Hall at UCB circa 1981].