Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Being a universal function approximator means that a multi-layer NN can approximate any bounded continuous function to an arbitrary degree of accuracy. But it says nothing about learnability and the structure required may be unrealistically large.

The learning algorithm used: Backpropagation with Stochastic Gradient Descent is not the universal learner. It's not guaranteed to find the global minimum.



Specifically, the "universal function approximate" thing means no more and no less than the relatively trivial fact that if you draw a bunch of straight line segments you can approximate any (1D, suitably well-behaved) function as closely as you want by making the lines really short. Translating that to N dimensions and casting it into exactly the form that applies to neural networks and then making the proof solid isn't even that tough, it's mostly trivial once you write down the right definitions.


Specifically for neural networks, is there any alternative for backpropagation and gradient descent which guarantee finding the global minimum?


Unlikely given the dimensionality and complexity of the search space. Besides, we probably don’t even care about the global minimum: the loss we’re optimising is a proxy for what we really care about (performance on unseen data). Counter-example: a model that perfectly memorises the training data can be globally optimal (ignoring regularization), but is not very useful.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: