It's a pretty dumb acronym though since it's doubly redundant. The "T" is the on...

It's a pretty dumb acronym though since it's doubly redundant.

The "T" is the only descriptive bit. The transformer is inherently a generative architecture - a sequence predictor/generator, so "generative" adds nothing to the description. All current neural net models are trained before use, so "pretrained" adds nothing either.

It's like calling a car an MPC - a mobile pre-assembled car.