I imagine they’re considering offering GPT-3, which would be cost prohibitive to fine-tune for most people. I also I heard inference was too slow to be practical. Perhaps they have some FPGA magic up their Microsoft sleeves.
i don't understand. if they run it for you and you apply transfer learning and fine tuning on your specific use case that would reduce drastically the costs hence why their offer make sense
Precisely my point. If they could put a model as large as GPT-3 into production (at a reasonable price to the consumer), wouldn’t that be a 10x improvement?
If the OP is right that nobody is putting the largest models into production (which I think is in inaccurate statement), then GPT-3 in production would be a 10x (ok, 5x?) improvement over the small GPT-2s and BERTS in production? So 10x in practice, if the hypothesis is correct? Which like I said, I don’t believe to be the case.