Hacker News new | past | comments | ask | show | jobs | submit login

The point is that no one could train deep nets 10 years ago. Not just because of computing power, but because of bad initializations, and bad transfer functions, and bad regularization techniques, etc.

These things might seem like "small iterative refinements", but they add up to 100x improvement. Even when you don't consider hardware. And you should consider hardware too, it's also a factor in the advancement of AI.

Also reading through old research, there is a lot of silly ideas along with the good ones. It's only in retrospect that we know this specific set of techniques work, and the rest are garbage. At the time it was far from certain what the future of NNs would look like. To say it was predictable is hindsight bias.




They could. There was a different set of tricks that didn't work as well (greedy pretraining).


Lots of people tried and failed.

Today lots of people-- ones with even less background and putting in less effort-- try and are successful.

This is not a small change, even if it is the product of small changes.


Reconnecting to my original point way up-thread, my point is these "innovations" have not substantially expanded the types of models we are capable of expressing (they have certainly expanded the size of the types of models we're able to train), not nearly to the same degree as backprop/convnets/LSTMs did way back decades ago (this is important because AGI will require several expansions in the types of models we are capable of implementing).


Right, LSTM was invented 20 years ago. 20 years from now, the great new thing will be something that has been published today. It takes time for new innovations to gain popularity and find their uses. That doesn't mean innovations are not being made!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: