Leading up to ChatGPT, virtually no model innovation was needed. Just some minor tweaks to activation functions and the order in which operations were completed in the feedfoward layers. So they were essentially trying to trademark the innovation published openly in Attention is All You Need?