I got the 1T GPT-4 number from here - this is the video that goes with the Micro...

sandkoan · on April 17, 2023

Bubeck has clarified that the "1 trillion" number he was throwing around was just a hypothetical metaphorical—it was in no way shape or form implying that GPT-4 has 1 trillion parameters [0].

[0] https://twitter.com/SebastienBubeck/status/16441515797238251...

HarHarVeryFunny · on April 17, 2023

OK - thanks!

So we're back to guessing ...

A couple of years ago Altman claimed that GPT-4 wouldn't be much bigger than GPT-3 although it would use a lot more compute.

https://news.knowledia.com/US/en/articles/sam-altman-q-and-a...

OTOH, given the massive performance gains scaling from GPT-2 to GPT-3, it's hard to imagine them not wanting to increase the parameter count at least by a factor of 2, even if they were expecting most of the performance gain to come from elsewhere (context size, number of training tokens, data quality).

So in 0.5-1T range, perhaps ?

HarHarVeryFunny · on April 20, 2023

FWIW, Stephen Gou, Manager of ML at Cohere, is currently doing a Reddit AMA, and is also guessing at 1T params for GPT-4.

https://www.reddit.com/r/IAmA/comments/12rvede/im_stephen_go...