Something being invented previously doesn't mean that it existed as a matter of engineering practicality; improved performance is some but not all of that. Just describing something in a paper isn't enough to make it have impact, many things described in papers simply don't work as described.
A decade ago I was trying and failing to build multi-layer networks with back-propagation-- it doesn't work so well. More modern, refined, training techniques seem to work much better... and today tools for them are ubiquitous and are known to work (especially with extra CPU thrown at them :) ).
The point is that no one could train deep nets 10 years ago. Not just because of computing power, but because of bad initializations, and bad transfer functions, and bad regularization techniques, etc.
These things might seem like "small iterative refinements", but they add up to 100x improvement. Even when you don't consider hardware. And you should consider hardware too, it's also a factor in the advancement of AI.
Also reading through old research, there is a lot of silly ideas along with the good ones. It's only in retrospect that we know this specific set of techniques work, and the rest are garbage. At the time it was far from certain what the future of NNs would look like. To say it was predictable is hindsight bias.
Reconnecting to my original point way up-thread, my point is these "innovations" have not substantially expanded the types of models we are capable of expressing (they have certainly expanded the size of the types of models we're able to train), not nearly to the same degree as backprop/convnets/LSTMs did way back decades ago (this is important because AGI will require several expansions in the types of models we are capable of implementing).
Right, LSTM was invented 20 years ago. 20 years from now, the great new thing will be something that has been published today. It takes time for new innovations to gain popularity and find their uses. That doesn't mean innovations are not being made!
A decade ago I was trying and failing to build multi-layer networks with back-propagation-- it doesn't work so well. More modern, refined, training techniques seem to work much better... and today tools for them are ubiquitous and are known to work (especially with extra CPU thrown at them :) ).