I think your mental model could be making LLMs seem more confusing than they are. LLMs are stacks of transformers and generative LLMs typically have another model that samples the transformer output.
Maybe there's useful abstractions for analyzing them, but LLMs are just another deep learning model.
The "attention" mechanism (a bit of a misnomer really) is what makes transformers more complex than many other neural nets - data isn't simply flowing through the model from layer to layer, but rather it is being copied and moved around by the attention heads. The "next word" it is generating doesn't even have to be a word it has ever seen before - it may be copying it from the prompt.
Maybe there's useful abstractions for analyzing them, but LLMs are just another deep learning model.