That paper is using a pre-existing term "Generative Pretraining" [0] and applyin...

That paper is using a pre-existing term "Generative Pretraining" [0] and applying it to Transformers (a Google innovation [1]). As far as I can see from a search, they don't even use the term GPT or Generative Pretrained Transformers in that paper, and they don't in the accompanying blog post either [2]. A sibling [3] claims that the BERT paper in Oct 2018 was the first to use the term GPT to describe what OpenAI built, and that sounds reasonable since a cursory look through the Sep 2018 archive of openai.com turns up nothing.

[0] See this 2012 example: http://cs224d.stanford.edu/papers/maas_paper.pdf

[1] https://proceedings.neurips.cc/paper/2017/file/3f5ee243547de...

[2] http://web.archive.org/web/20180923011305/https://blog.opena...

[3] https://news.ycombinator.com/item?id=39381802