Aren't transformers markov chains mathematically? You couldn't practically imple...

lou1306 · 2025-09-24T08:40:01 1758703201

Yes, in the sense that they can be formalised as such. Given n tokens t1..tn, a transformer computes the probability p(t') that every token t' has of being the next. Each of these probabilities are arcs from a state t1..tn to another state t1..tn,t' weighted by p(t').

And since the context window is bounded, this is guaranteed to be finite-state, and all states are transient except those where the window is saturated.