Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

After all these iterations of Alpha-[blank], [blank]-Zero, now MuZero, etc, I'm wondering:

If I'm interested in building a toy version following the Deepmind spec, which can be trained to reach super-human capabilities on a particular board game (Reversi, Chess, checkers, possibly even Go given enough compute), which of these "versions" of the project would be the easiest for me to understand/implement? (assume I have a basic understanding of the high-level concepts and lots of enthusiasm, but I'm not an expert).

My understanding is, AlphaZero is not just stronger than AlphaGo, but architecturally simpler and more efficient. That's what I'm looking for -- the implementation with the highest result/difficulty ratio.



AlphaGo Master, unsurprisingly, was significantly stronger than AlphaGoZero. AlphaZero, although it can play multiple games, was weaker yet. In both cases, they compared the 40 block version of the one with the 20 block version of the other (they had to double the network size to approach the level of the predecessor.)

Recently, Katago has reached similar levels of strength using a small fraction of the resources: https://arxiv.org/abs/1902.10565

It depends on what you mean by "more efficient." The significance of AlphaZero was that you can reach good results in a variety of domains even without human expert knowledge to provide supervised learning data or engineer features. It's efficient in terms of engineering resources.

A precisely tailored approach can always get better results.


Has it been improved? AlphaZero overtook AlphaGo Master previously https://en.wikipedia.org/wiki/AlphaGo_Zero#Comparison_with_p...


The 40 block version of AlphaGo Zero is stronger than the 20 block version of AlphaGo Master.


This is a bit outside of my comfort zone so I'm not sure I quite get what these blocks are. Has any version of alphago master bested alphago zero?


> which of these "versions" of the project would be the easiest for me to understand/implement?

I have the same question. Not sure I have an answer yet, but this paper includes some pseudocode that implements the algorithm: https://arxiv.org/src/1911.08265v1/anc/pseudocode.py

I'm planning on trying to train something simple like TicTacToe to both see if it works and understand how it works.


Pick a simple game, so your search space is smaller, and you won’t need 10,000 GPUs to get anything done




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: