Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So, it seems the trajectory is one of increasing generality and capability of models and increasing reliance on them.

If it's at all possible to improve our technology then we will. If we improve it it increases in utility. If it increases in utility we use it more.

What other thesis is there?



The model architecture has stayed roughly the same since the original AIAYN transformer in 2017. That’s 6 years of nothing fundamental happening.

Now, obviously the models have got hugely better in capabilities since BERT. Everything else has advanced. Tweaking, tuning and scaling have delivered true intelligence, albeit sub-human. But it seems unlikely that transformers are what take us to human-parity AGI and beyond, because the more we optimize these word predictors the more we find their limitations.

The lack of architecture changes over the last 6 years creates a huge amount of “potential energy”. A new model architecture might well push us over the human-parity threshold. It wouldn’t surprise me if I wake up one day to find that transformers are obsolete and Google has trained a human-parity AGI with a new arch.

This could happen tomorrow or in 20 years, transformers had an easy discovery path from RNNs, to RNNs with attention mechanisms, to Transformers. Architecture X seems to have a much more obscure discovery path.


It's certainly going to be very interesting what comes out of the training runs that are going to be done on giant clusters of H100s.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: