1202 CPUs and 176 GPUs is the figure mentioned in the Nature paper. But it's important to understand that this is the computer used to train the networks used by the algorithm. It took about 30+ days worth of wallclock to train it. That's about 110 megawatt-hours (MWh) worth of energy required!
During the play, the computational requirements are vastly less (but I don't know the figures). It's still probably more than is feasible to put in a smartphone in the near future. Assuming we get 3x improvement in perf per watt from going to ~20nm chips to ~7nm chips (near the theoretical minimum for silicon chips), I don't think this will work on a battery powered device. And CPUs are really bad at perf per watt on neural networks, some kind of GPU or ASIC setup will be required to make it work.
That's not correct; those numbers refer to the system requirements while actually playing. To quote from the paper:
> Evaluating policy and value networks requires several orders of magnitude more computation than traditional search heuristics. AlphaGo uses an asynchronous multi-threaded search that executes simulations on CPUs, and computes policy and value networks in parallel on GPUs. The final version of AlphaGo used 40 search threads, 48 CPUs, and 8 GPUs. We also implemented a distributed version of AlphaGo that exploited multiple machines, 40 search threads, 1202 CPUs and 176 GPUs.
In fact, according to the paper, only 50 GPUs were used for training the network.
The cumulative amount of person-hours that went into training Lee Sedol (All the hours spent training his instructors, sparring partners, developing Go theory, playing out, and drawing inferrences from the outcomes of long-dead expert players) is probably more then 500 years. AlphaGo, on the other hand, had to start from scratch.
Given the rules, and a big book containing every professional go game ever played, and no other instruction, it's not entirely clear to me that Lee Sedol would be able to reach his current skill level in 500 years.
And thus why we're not destined to compete with AI, that 110MWh worth of training time can be instantly available to all other Go bots. If only I could have access to a Grandmaster's brain when I needed it!
Are they vastly less, though? The core of the algorithm is still a deep Monte Carlo Tree Search which AlphaGo gets quite a boost on computationally for being able to fire it off in parallel. It's obviously incorrect to take the training system and assume it's identical to the live system, but I think it's disingenuous to say the live system didn't have some serious horsepower.
Yes, for neural networks usually training them takes many orders of magnitude more resources than just using them.
For this particular example, training a system involves (1) analysis of every single game of professional go that has been digitally recorded; and (2) playing probably millions of games "against itself", both of which require far more computing power than just playing a single game.
I'm very aware of that. What I'm saying is that AlphaGo is not merely a neural net reporting best moves directly off the forward propagation. There are two nets which essentially act as proposal distributions for an exploration/exploitation tradeoff in the search space of game trees by which AlphaGo reads positions essentially out to the end of the game and ranks them by win rate (this is Monte Carlo Tree Search). The net moves are "nice" (I think they run at like 80% win rate against some other Go AIs? Maybe I'm misremembering) but the real heart of what makes AlphaGo play well is the MCTS which requires some vast resources to execute—live resources.
I was actually thinking primarily of distributed training time for the networks and playing time for the system, rather than the number of GPUs running this particular match. Also, I thought the number of GPUs in October was more on the order of 1,000? Happy to be told I'm mistaken though.