After all these iterations of Alpha-[blank], [blank]-Zero, now MuZero, etc, I'm wondering:
If I'm interested in building a toy version following the Deepmind spec, which can be trained to reach super-human capabilities on a particular board game (Reversi, Chess, checkers, possibly even Go given enough compute), which of these "versions" of the project would be the easiest for me to understand/implement? (assume I have a basic understanding of the high-level concepts and lots of enthusiasm, but I'm not an expert).
My understanding is, AlphaZero is not just stronger than AlphaGo, but architecturally simpler and more efficient. That's what I'm looking for -- the implementation with the highest result/difficulty ratio.
AlphaGo Master, unsurprisingly, was significantly stronger than AlphaGoZero. AlphaZero, although it can play multiple games, was weaker yet. In both cases, they compared the 40 block version of the one with the 20 block version of the other (they had to double the network size to approach the level of the predecessor.)
Recently, Katago has reached similar levels of strength using a small fraction of the resources: https://arxiv.org/abs/1902.10565
It depends on what you mean by "more efficient." The significance of AlphaZero was that you can reach good results in a variety of domains even without human expert knowledge to provide supervised learning data or engineer features. It's efficient in terms of engineering resources.
A precisely tailored approach can always get better results.
If I'm interested in building a toy version following the Deepmind spec, which can be trained to reach super-human capabilities on a particular board game (Reversi, Chess, checkers, possibly even Go given enough compute), which of these "versions" of the project would be the easiest for me to understand/implement? (assume I have a basic understanding of the high-level concepts and lots of enthusiasm, but I'm not an expert).
My understanding is, AlphaZero is not just stronger than AlphaGo, but architecturally simpler and more efficient. That's what I'm looking for -- the implementation with the highest result/difficulty ratio.