Maybe I mixed up that paper with another but the one I meant to post shows that you can read something like a world model from the activations of the layers.
There was a paper that shows a model trained on Othello moves creates a model of the board, models the skill level of their opponent and more.
There was a paper that shows a model trained on Othello moves creates a model of the board, models the skill level of their opponent and more.