The Doppler Quarterly Spring 2018

The pure compute power applied to the problem made it appear to the human opponent, reigning world champion Garry Kasparov, that at times it was exhibiting deep intelligence and creativity. In reality, it was simply able to evaluate more data more quickly to mathematically calculate and present optimum moves. In an article published in October 2017, the AlphaGo team announced AlphaGo Zero, which was a version that learned without human data by only playing itself. This technique is known as reinforcement learning. AlphaGo Zero used one neural network to do this rather than the two used by earlier versions of AlphaGo - a “policy network” to select the next move to play and a “value network” to predict the winner of the game from each position. This new algorithm surpassed AlphaGo Lee in 3 days and AlphaGo Master in 21. By the 40th day it had surpassed all previous versions! In today’s computer chess research, the focus has shifted from computer hardware to more optimized software. By comparison, in a November 2006 match, a modern chess program named Deep Fritz beat world chess champion Vladimir Kramnik, with the program running on a common desktop computer with a dual-core Intel Xeon 5160 CPU that was capa- ble of evaluating only 8 million positions per second, but searching to an average depth of 17 to 18 moves using a heuristics technique. In another paper released in December 2017, Deep- Mind claimed that it generalized AlphaGo Zero's approach into an algorithm named AlphaZero, which achieved within 24 hours a superhuman level of play across the games of chess, shogi (also known as Japa- nese chess), and Go by defeating the respective world-champion programs - Stockfish, Elmo, and the 3-day version of AlphaGo Zero. Note that both AlphaGo Zero and AlphaZero both ran on a single machine with 4 TPUs. ing hundreds of thousands of master and grandmas- ter games. The results were then fine-tuned by actual grandmasters themselves. The system would gener- ally evaluate searches to a depth of 6-8 moves, with the ability to go 20 moves or more in certain cases. Advance Number Two Emerges The next major advance was accomplished in March 2016 when an AI system developed by DeepMind called AlphaGo beat one of the highest ranked world Go champions, Lee Sedol, in 4 of 5 games. Go is a more complex game than chess with a 19x19 board and the added complexity of pieces being flipped when surrounded by an opponent’s stones. AlphaGo applied DL techniques rather than the brute force approach of Deep Blue. This approach leveraged a lot of data from prior human matches to train the AI. The version that beat Lee, AlphaGo Lee, was replaced in late 2016/early 2017 by a version called AlphaGo Master, which reduced the compute power from 48 distributed TPUs to 4 TPUs running on a single machine. A TPU, Google’s Tensor Processing Unit, delivers 15-30x higher performance than contempo- rary CPUs and GPUs. AlphaGo Master won 60 of 60 online matches against a group that included most of the world’s best players and world champion Ke Jie, who it beat 3 of 3 matches. 38 | THE DOPPLER | SPRING 2018

The Doppler Quarterly Spring 2018 | Page 40