I am 100% certain that the training of such an AI will result in winning a game without ever building a single city* and 1,000 other exploits before being nerfbatted enough to play a 'real' game.
(That doesn't mean I don't want to see the ridiculousness it comes up with!)
I knew it, I knew it! It would be a Spiffing Brit video.
That guy is a genius at finding exploits in computer games. I don't know how he does it, I think you need to play a fair bit of each game before you find these little corners of the ruleset.
If you train the model purely based on win rate, sure. Fortunately, we can efficiently use RLHF to train a model to play in a human-like way and give entertaining matches.
Maybe, maybe not. The stochastic, black-box nature of the current wave of ML systems gives me a gut feeling that using them like this is more of a Monkey's Paw wish granter than useful tool without a lot of refinement first. Time will tell!
(That doesn't mean I don't want to see the ridiculousness it comes up with!)
* https://www.youtube.com/watch?v=6CZEEvZqJC0