As mentioned this is difficult. AFAIK the main reason is that the power of neural nets come from the non-linear functions applied at each node ("neuron"), and thus there's nothing like the superposition principle[1] to easily combine training results.
The lack of superposition means you can't efficiently train one layer separately from the others either.
That being said, a popular non-linear function in modern neural nets is ReLU[2] which is piece-wise linear, so perhaps there's some cleverness one can do there.
The lack of superposition means you can't efficiently train one layer separately from the others either.
That being said, a popular non-linear function in modern neural nets is ReLU[2] which is piece-wise linear, so perhaps there's some cleverness one can do there.
[1]: https://en.wikipedia.org/wiki/Superposition_principle
[2]: https://en.wikipedia.org/wiki/Rectifier_(neural_networks)