They way to think about NNs is that they are already made of functions; groups o...

They way to think about NNs is that they are already made of functions; groups of layered nodes become complex nonlinear functions. For example a small 3-layer network can learn to model a cubic spline function. The internals of the function are learned at every step of the way; every addition and multiplication. You can assume the number of functions in a network is a fraction of the number of weights. This makes the NN theoretically more flexible and powerful than modeling it using more complex functions, because it learns and adapts each and every function during training.

I would assume its possible using certain functions to, say, model a small fixed-function MLP could perhaps result in more efficient training, if we know the right functions to use. But you could end up losing perf too if not careful. I’d guess the main problems are we don’t know what functions to use, and adding nonlinear functions might come with added difficultly wrt performance and precision and new modes of initialization and normalization. Linear math is easy and powerful and already capable of modeling complex functions, but nonlinear math might be useful I’d guess… needs more study! ;)