We know that the relative location of the tokens in the training data influences the relative locations of the predicted tokens. Yes the specifics of any given related tokens are a black box because we're not going to go analyze billions of weights for every token we're interested in. But it's a statistical model, not a logic model.