We know that the relative location of the tokens in the training data influences...

We know that the relative location of the tokens in the training data influences the relative locations of the predicted tokens. Yes the specifics of any given related tokens are a black box because we're not going to go analyze billions of weights for every token we're interested in. But it's a statistical model, not a logic model.