If I'm reading this right, the core of the argument is that since any continuous...

leereeves · on Feb 14, 2019

That is a serious practical problem that we find even in this paper. The authors were unable to fit a degree 3 polynomial to a subset (just 26 components) of MNIST data (handwritten digits) due to a "memory issue".

But mathematical theory need not be practical. The relation between NNs and polynomial regression might be a fruitful theoretical observation even if the equivalent polynomial regression is incalculable.

lalaithion · on Feb 14, 2019

They were unable to fit a degree three polynomial, but were still able to obtain a 97% accuracy with a degree two polynomial.

ivalm · on Feb 14, 2019

I don't think it is even fruitful. We already know that mappings that don't contain poles can be approximated in various ways: Taylor expansion, piecewise linear, fourier transforms.. Taylor expansion is the polynomial fitting for authors, piecewise linear is NN with relu activation.

eklitzke · on Feb 14, 2019

Not to mention that as a practical matter, the ability to train a neural network with backpropagation is important to get results that actually converge in a reasonable amount of time. It's not useful to say "but you could just use a polynomial regression" if you can't actually generate the equivalent polynomial regression in the same amount of time that you can train a neural network.

scottlocklin · on Feb 14, 2019

I'm not sure that's actually correct. In fact I'm sure it is incorrect for a polynomial of degree 1. Otherwise, there's nothing special about relu or tanh that you can't use sequential/backprop on polynomial regression in general.

pfortuny · on Feb 14, 2019

Not to be nitpicky: it is not the Taylor polynomial (it does not approach any continuous function). It is a result by weierstrass on polynomial approximation on a closed interval.

f(x)=exp(-x^2) has the same Taylor expansion as g(x)=0 at x=0.

aqwsedopl · on Feb 14, 2019

Wouldn’t the math academics have seen this in literally one second. How was this not asserted even earlier? Just genuinely curious as a completely unaware programmer

Chabs · on Feb 14, 2019

My gut feeling is that yes, this is pretty much self evident.

However, the interesting part of the paper is that they use that equivalency to propose that properties of polynomial regression are applicable to Neural networks, and draw some conclusions from that.

krcz · on Feb 14, 2019

> any continuous function is can be approximated via a taylor series expansion We can get a good polynomial approximation of any continuous function but just on bounded set. Wouldn't such assumption (restriction of activation function domain) be problematic?

joker3 · on Feb 14, 2019

I think that's a very good point. Yes, you can approximate any given clasical NN with a polynomial, but how does the number of terms in the polynomial scale with the network size and the desired accuracy? There might be a very good paper there.