Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As mentioned this is difficult. AFAIK the main reason is that the power of neural nets come from the non-linear functions applied at each node ("neuron"), and thus there's nothing like the superposition principle[1] to easily combine training results.

The lack of superposition means you can't efficiently train one layer separately from the others either.

That being said, a popular non-linear function in modern neural nets is ReLU[2] which is piece-wise linear, so perhaps there's some cleverness one can do there.

[1]: https://en.wikipedia.org/wiki/Superposition_principle

[2]: https://en.wikipedia.org/wiki/Rectifier_(neural_networks)



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: