It's a pretty dumb acronym though since it's doubly redundant.
The "T" is the only descriptive bit. The transformer is inherently a generative architecture - a sequence predictor/generator, so "generative" adds nothing to the description. All current neural net models are trained before use, so "pretrained" adds nothing either.
It's like calling a car an MPC - a mobile pre-assembled car.
The "T" is the only descriptive bit. The transformer is inherently a generative architecture - a sequence predictor/generator, so "generative" adds nothing to the description. All current neural net models are trained before use, so "pretrained" adds nothing either.
It's like calling a car an MPC - a mobile pre-assembled car.