I don't remember the name of the theorem, but you can approximate any nonlinear ...

tsimionescu · on Dec 5, 2022

Note that there are two caveats.

For one, only continuous functions can be represented.

Much more importantly, the theorem doesn't prove that it's possible to learn the necessary weights to approximate any function, just that such weights much exist.

With our current methods, only a subset of all possible NNs are actually trainable, so we can only automate the construcion approximations for certain continuous functions (generally those that are differentiable, but there may be exceptions, I'm not as sure).

whatshisface · on Dec 5, 2022

If we're talking about approximation, continuous functions can converge to step functions just fine. Take a regular sigmoid and keep raising the weight to see one example. That's a good point about training though, that theorem doesn't fully explain why NNs work, although it somewhat sounds like it does.

tsimionescu · on Dec 6, 2022

There are discontinuous functions which are not steps - one of the more useful ones being tan(x) for any real x. Of course, since tan(x) is piece wise continuous and periodic, it is probably easy to work around in practice.

mjburgess · on Dec 6, 2022

Their derivatives, f^n(x) for arbitrary n, do not converge fine. Indeed, even n=1 is usually terrible in practice.

Fit a pleasant day-trip velocity curve with a NN and the acceleration would kill you.

whatshisface · on Dec 6, 2022

Wouldn't that be mostly a consequence of the loss function used to compute the distance between the curve and the model output?

joe_the_user · on Dec 6, 2022

The basic approximation theorem you might be thinking of is known as Kolmagorov's Theorem (dude got around). It's an early theorem, from 1957, that's about the universality of functions of a linear combination of single variable function.

But all the other universality theorems refer back to it and don't have their own names, for example; Optimal approximation of continuous functions by very deep ReLU networks by Dmitry Yarotsky [1]. The reference for the original theorem would be On The Structure Of Continuous Functions Of Several Variable, David A. Sprecker [2],

[1] http://proceedings.mlr.press/v75/yarotsky18a

[2] https://www.ams.org/journals/tran/1965-115-00/S0002-9947-196...

jmalicki · on Dec 5, 2022

https://en.wikipedia.org/wiki/Universal_approximation_theore...

kkylin · on Dec 5, 2022

Are you talking about something like this?

https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Arnold_repr...