Precisely my point. If they could put a model as large as GPT-3 into production (at a reasonable price to the consumer), wouldn’t that be a 10x improvement?
If the OP is right that nobody is putting the largest models into production (which I think is in inaccurate statement), then GPT-3 in production would be a 10x (ok, 5x?) improvement over the small GPT-2s and BERTS in production? So 10x in practice, if the hypothesis is correct? Which like I said, I don’t believe to be the case.