If the OP is right that nobody is putting the largest models into production (which I think is in inaccurate statement), then GPT-3 in production would be a 10x (ok, 5x?) improvement over the small GPT-2s and BERTS in production? So 10x in practice, if the hypothesis is correct? Which like I said, I don’t believe to be the case.