The pure compute power applied to the problem
made it appear to the human opponent, reigning
world champion Garry Kasparov, that at times it was
exhibiting deep intelligence and creativity. In reality,
it was simply able to evaluate more data more quickly
to mathematically calculate and present optimum moves. In an article published in October 2017, the AlphaGo
team announced AlphaGo Zero, which was a version
that learned without human data by only playing
itself. This technique is known as reinforcement
learning. AlphaGo Zero used one neural network to
do this rather than the two used by earlier versions of
AlphaGo - a “policy network” to select the next move
to play and a “value network” to predict the winner of
the game from each position. This new algorithm
surpassed AlphaGo Lee in 3 days and AlphaGo Master
in 21. By the 40th day it had surpassed all previous
versions!
In today’s computer chess research, the focus has
shifted from computer hardware to more optimized
software. By comparison, in a November 2006 match,
a modern chess program named Deep Fritz beat
world chess champion Vladimir Kramnik, with the
program running on a common desktop computer
with a dual-core Intel Xeon 5160 CPU that was capa-
ble of evaluating only 8 million positions per second,
but searching to an average depth of 17 to 18 moves
using a heuristics technique. In another paper released in December 2017, Deep-
Mind claimed that it generalized AlphaGo Zero's
approach into an algorithm named AlphaZero, which
achieved within 24 hours a superhuman level of play
across the games of chess, shogi (also known as Japa-
nese chess), and Go by defeating the respective
world-champion programs - Stockfish, Elmo, and the
3-day version of AlphaGo Zero. Note that both
AlphaGo Zero and AlphaZero both ran on a single
machine with 4 TPUs.
ing hundreds of thousands of master and grandmas-
ter games. The results were then fine-tuned by actual
grandmasters themselves. The system would gener-
ally evaluate searches to a depth of 6-8 moves, with
the ability to go 20 moves or more in certain cases.
Advance Number Two Emerges
The next major advance was accomplished in March
2016 when an AI system developed by DeepMind
called AlphaGo beat one of the highest ranked world
Go champions, Lee Sedol, in 4 of 5 games. Go is a
more complex game than chess with a 19x19 board
and the added complexity of pieces being flipped
when surrounded by an opponent’s stones. AlphaGo
applied DL techniques rather than the brute force
approach of Deep Blue. This approach leveraged a lot
of data from prior human matches to train the AI.
The version that beat Lee, AlphaGo Lee, was replaced
in late 2016/early 2017 by a version called AlphaGo
Master, which reduced the compute power from 48
distributed TPUs to 4 TPUs running on a single
machine. A TPU, Google’s Tensor Processing Unit,
delivers 15-30x higher performance than contempo-
rary CPUs and GPUs. AlphaGo Master won 60 of 60
online matches against a group that included most of
the world’s best players and world champion Ke Jie,
who it beat 3 of 3 matches.
38 | THE DOPPLER | SPRING 2018