Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> it doesn’t seem more exotic or interesting to me then asking “how does the pseudo inverse of A ‘learn’ to approximate the formula Ax=b?

Asking things like properties of the pseudoinverse against a dataset on some distribution (or even properties of simple regression) is interesting and useful. If we could understand neural networks as well as we understand linear regression, it would be a massive breakthrough, not a boring "it's just minimizing a loss function" statement.

Hell even if you just ask about minimizing things, you get a whole theory of M estimators [0]. This kind of dismissive comment doesn't add anything.

[0] https://en.wikipedia.org/wiki/M-estimator



You raise a fair point, I do think that it’s important to understand how the properties of the data manifest in the least-squares solution to Ax=b. Without that, the only insights we have are from analysis, while we would be remiss to overlook the more fundamental theory, which is linear algebra. However, my suspicion is that the answer to these same questions but applied to nonlinear function approximators is probably not much different from the insights we have already gained in more basic systems. However, the overly broad title of the manuscript doesn’t seem to point toward those kinds of questions (specifically, things like “how do properties of the data manifold manifest in the weight tensors”) and I’m not sure that one should equate those things to “learning”.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: