The 'game playing' you lead with is clearly dominated by deep learning in this d...

YeGoblynQueenne · on Aug 12, 2023

When I wrote the comment you replied to I was thinking specifically, and, admittedly narrowly, of adversarial search rather than general game playing but even so it's not that simple.

Deep Learning is certainly dominant in computer games like Atari. However, in classic board games dominant systems combine deep learning and classical search-based approaches (namely Monte-Carlo Tree Search, MCTS, a stochastic version of minimax). Deep Learning has led to improved performance but, on its own, without a tree search, it is nowhere near the performance of the two, combined [1].

Also, the dominant approach in Poker is not deep learning but Counterfactual Regret Minimization, a classical adversarial tree search approach. For example, see Pluribus, a poker-playing agent that can outplay humans in six-player poker. As far as I can tell, Pluribus does not use deep learning at all (and is much cheaper to train by self-play for that). Deep Learning poker bots exist, but are well behind Pluribus in skill.

So I admit, not "completely useless" for game playing, but even here deep learning is not as dominant as is often assumed.

_____________

[1] The contribution of each approach, deep learning and classical adversarial search of a game tree, may not be entirely clear by reading, for example, the DeepMind papers on AlphaGo and its successors (in the μZero paper, MCTS is all but hidden away behind a barrage of unnecessary abstraction). It seems that DeepMind was trying to make it look like it was their neural nets that were doing all the job, probably because that's the approach they are selling, rather than MCTS, which isn't their invention anyway (neither is reinforcement learning, or deep learning, and many other approaches that they completely failed to attribute in their papers). It should be obvious however that AlphaGo and friends would not include an MCTS component unless they really, really needed it. And they do.

IBM had tried a similar trick back in the '90s when their Deep Blue beat Gary Kasparov: the whole point of having a wardrobe-sized supercomputer play chess against a grand master was an obvious marketing ploy by a company who (still at the time) was in the business of selling hardware. In truth, the major contributor to the win against Kasparov was alpha-beta minimax, and an unprecedented database of opening moves. But minimax and knowledge engineering was just not what IBM sold.

sdenton4 · on Aug 12, 2023

I'm very familiar with how mcts is used in alpha go and mu zero.

I'm not sure how you can say it's hidden in the details: the name of the paper is "mastering go with deep neutral networks and tree search."

It's also not an oversell on the deep learning component. Per the ablations in the alpha go paper, the no-mcts ELO is over 2000, while the mcts-only ELO is a bit under 1500. Combining the two gives an ELO of nearly 3000. So the deep learning system is outperforming the mcts-only system, and gets a significant boost from using mcts.

The mu zero paper also does not hide the tree search; it is prominent in the figures and mentioned in captions, for example. It is not the main focus of the paper, though, so perhaps isn't discussed as much as in the alpha go paper.

(Weirdly axe-grindy comment...)

YeGoblynQueenne · on Aug 12, 2023

Well I haven't read those papers since they came out so I will defer to your evidently better recollection. It seems I formed an impression from what was going around on HN and the media at the time and I misremember the content of the papers.

>> (Weirdly axe-grindy comment...)

https://youtu.be/m9KbmRTgigQ