OTOH, given the massive performance gains scaling from GPT-2 to GPT-3, it's hard to imagine them not wanting to increase the parameter count at least by a factor of 2, even if they were expecting most of the performance gain to come from elsewhere (context size, number of training tokens, data quality).
So we're back to guessing ...
A couple of years ago Altman claimed that GPT-4 wouldn't be much bigger than GPT-3 although it would use a lot more compute.
https://news.knowledia.com/US/en/articles/sam-altman-q-and-a...
OTOH, given the massive performance gains scaling from GPT-2 to GPT-3, it's hard to imagine them not wanting to increase the parameter count at least by a factor of 2, even if they were expecting most of the performance gain to come from elsewhere (context size, number of training tokens, data quality).
So in 0.5-1T range, perhaps ?