Your 2D bitmap example perfectly illustrates the fundamental limitation! The exp...

Your 2D bitmap example perfectly illustrates the fundamental limitation! The exponential state explosion you encountered (needing 2^32 states for patterns separated by randomness) is precisely why classical Markov chains became intractable for complex dependencies.

What's fascinating is that transformers with attention don't actually escape the Markov property - they're mathematically equivalent to very high-order Markov chains where the entire context window forms the "state." The breakthrough wasn't abandoning Markov chains, but finding a parameterized way to approximate these massive state spaces through learned representations rather than explicit enumeration.

Your observation about inevitably trending toward attention-like mechanisms is spot-on. The attention mechanism essentially provides a tractable approximation to the astronomically large transition matrices that would be required for a classical Markov chain to capture long-range dependencies. It's a more elegant solution to the same fundamental problem you were solving with skip states.