Being a universal function approximator means that a multi-layer NN can approxim...

cornel_io · on Nov 16, 2023

Specifically, the "universal function approximate" thing means no more and no less than the relatively trivial fact that if you draw a bunch of straight line segments you can approximate any (1D, suitably well-behaved) function as closely as you want by making the lines really short. Translating that to N dimensions and casting it into exactly the form that applies to neural networks and then making the proof solid isn't even that tough, it's mostly trivial once you write down the right definitions.

yu3zhou4 · on Nov 16, 2023

Specifically for neural networks, is there any alternative for backpropagation and gradient descent which guarantee finding the global minimum?

b3kart · on Nov 16, 2023

Unlikely given the dimensionality and complexity of the search space. Besides, we probably don’t even care about the global minimum: the loss we’re optimising is a proxy for what we really care about (performance on unseen data). Counter-example: a model that perfectly memorises the training data can be globally optimal (ignoring regularization), but is not very useful.