Why are people so intent on incorrectly asserting these models are Markov chains? It makes sense to use the analogy as an educational tool for exposition, but it more often seems that many use it as a way to minimize the notion that these models could ever possibly be useful for anyone. Is this just simply to make it more intuitive for others that it's a sequence model? Because it seems about as helpful as 'email is just bits' when everyone and their grandma knows about the relation between transformers, GAT, and circulant matrices.