Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I imagine they’re considering offering GPT-3, which would be cost prohibitive to fine-tune for most people. I also I heard inference was too slow to be practical. Perhaps they have some FPGA magic up their Microsoft sleeves.


Nobody is putting these huge models in production, even the smaller transformer models are still too expensive to run for most use cases.

With the way the field is moving, GPT-3 will be old news in a month, when more advances are made and open sourced.


i don't understand. if they run it for you and you apply transfer learning and fine tuning on your specific use case that would reduce drastically the costs hence why their offer make sense


Precisely my point. If they could put a model as large as GPT-3 into production (at a reasonable price to the consumer), wouldn’t that be a 10x improvement?


GPT-3 isn't a 10X improvement. (At least from everything we know so far.)


If the OP is right that nobody is putting the largest models into production (which I think is in inaccurate statement), then GPT-3 in production would be a 10x (ok, 5x?) improvement over the small GPT-2s and BERTS in production? So 10x in practice, if the hypothesis is correct? Which like I said, I don’t believe to be the case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: