It’s mostly Moores law and scaling. Language model progress will be the next Moo...

visarga · on Feb 6, 2022

A recent paper shows that empowering a language model to search a text corpus (or the internet) for additional information could improve model efficiency by 25 times [1]. So you only need a small model because you can consult the text to get trivia.

That's 25x in one go. Maybe we have the chance to run GPT-4 ourselves and not need 20 GPU cards and a 1mil $ computer.

[1] https://deepmind.com/research/publications/2021/improving-la...