Are they biased by what they know about eachother's capabilities? I'm sure "4o" would have a certain prejudice in other models. So I wonder whether the original model names were masked?
If only a bit. "Estimate other players and adjust accordingly" is a part of the game.
Putting names onto the players just gives that an early start. You could use generic names instead, but that would just shift the pressure towards estimating other players by behavior instead of expectations.