OK - thanks! So we're back to guessing ... A couple of years ago Altman claimed ...

OK - thanks!

So we're back to guessing ...

A couple of years ago Altman claimed that GPT-4 wouldn't be much bigger than GPT-3 although it would use a lot more compute.

https://news.knowledia.com/US/en/articles/sam-altman-q-and-a...

OTOH, given the massive performance gains scaling from GPT-2 to GPT-3, it's hard to imagine them not wanting to increase the parameter count at least by a factor of 2, even if they were expecting most of the performance gain to come from elsewhere (context size, number of training tokens, data quality).

So in 0.5-1T range, perhaps ?