Wouldn't it stand to reason that LLMs should be obsolete in about a year due to ...

csomar · on April 7, 2023

This suggests that LLM is the path to take us to AGI. There is no proof for that. That seems to be the current bet, but my thinking (and it is strictly mine) that we are still technologically under-powered to achieve such a leap. Maybe in 10 years, or in one year or in a hundred of years. However, for that leap (and strictly my opinion), we need significant infrastructure leaps. Something like processors are x100 faster, or your laptop is powered by a 5Ghz 200 cores CPU...

As it stands, we can't even get the Mark Metaverse right. You are trying to convince me that we have the infra. for AGI? Not convinced.

nullsense · on April 7, 2023

Three observations here. Firstly it has been a really eye opening experience watching the innovation around Stable Diffusion and locally run LLMs and seeing that the unoptimized research code that needed such beefy hardware could actually be optimized to run on consumer hardware given sufficient motivation.

Secondly it wasn't obvious that deep learning was going to work as well as it did if you simply threw enough compute at it. Now that this tech has reached critical mass there is a tonne more money being poured into infra to support it.

Lastly, compute power is increasing as always. Nvidia releasing H100 and also their recent work on computational lithography. Also DeepMind finding new state-of-the-art algorithms for doing matrix multiplication with AlphaTensor. You can kinda already see the positive feedback loop in action.

I dunno... at this point I just wouldn't bet against the trajectory that we're on.

ChatGTP · on April 7, 2023

What actually is the trajectory we're on and what will we do once there?

nullsense · on April 7, 2023

So, it seems the trajectory is one of increasing generality and capability of models and increasing reliance on them.

If it's at all possible to improve our technology then we will. If we improve it it increases in utility. If it increases in utility we use it more.

What other thesis is there?

zarzavat · on April 8, 2023

The model architecture has stayed roughly the same since the original AIAYN transformer in 2017. That’s 6 years of nothing fundamental happening.

Now, obviously the models have got hugely better in capabilities since BERT. Everything else has advanced. Tweaking, tuning and scaling have delivered true intelligence, albeit sub-human. But it seems unlikely that transformers are what take us to human-parity AGI and beyond, because the more we optimize these word predictors the more we find their limitations.

The lack of architecture changes over the last 6 years creates a huge amount of “potential energy”. A new model architecture might well push us over the human-parity threshold. It wouldn’t surprise me if I wake up one day to find that transformers are obsolete and Google has trained a human-parity AGI with a new arch.

This could happen tomorrow or in 20 years, transformers had an easy discovery path from RNNs, to RNNs with attention mechanisms, to Transformers. Architecture X seems to have a much more obscure discovery path.

nullsense · on April 8, 2023

It's certainly going to be very interesting what comes out of the training runs that are going to be done on giant clusters of H100s.

greeny7373 · on April 7, 2023

I also believe we might see something unfold but we still don't know.

It would be wrong though not to keep an very close eye on it or also to embrace it because if it will not just happen in the next 20 years you still need to earn money and with expertise in ml you might be better of