Need a primer on Go? Check out our first in a series of blog posts this week during the big Go showdown between Google’s AlphaGo computer and Lee Se-dol, Go world champion.

56df2490a351d802222160

Playing games against computer opponents isn’t such a new thing. Computer games that can regularly defeat us, however, that’s a little newer. Computer games that can regularly defeat us at so-called advanced “mind” games like checkers, chess, and Scrabble is something newer still. But a computer system that can actually beat the very best player at one of the most complex games ever devised in human history? That actually hasn’t happened yet.

But it might. Very soon. With the help of deep learning.

A Game Of “Profound Complexity”

But first, some basics. For computers to play games, they generally play out all possibilities of games in simulation. Take chess, for example. A computer that plays chess, plays out every move in its “head,” decides which moves result in winning, then chooses that path. That’s the basic idea of computers playing games.

However, bigger games mean more moves. For example, in the opening move of chess, there are 20 possible moves. Compare that to Go. Go is played on a 19×19 board, and there are accordingly 361 possible opening moves. This effectively makes it impossible to pursue the strategy of playing out a series of moves in advance. There are simply too many possibilities.

As a recent blog post by Google notes:

“Go is a game of profound complexity. There are 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
,000,000,000,000,000,000,000,000,000,000,000,000 possible positions—that’s more than the number of atoms in the universe, and more than a googol times larger than chess.”

How Computers Learn

So what makes a great Go player? Some would say intuition. But that’s not something you can exactly replicate with a computer. But you can “train” a computer to master the relationships between positions on the board, and end results of the game. Here is how Google describes the process:

AlphaGo looks ahead by playing out the remainder of the game in its imagination, many times over—a technique known as Monte-Carlo tree search. But unlike previous Monte-Carlo programs, AlphaGo uses deep neural networks to guide its search. During each simulated game, the policy network suggests intelligent moves to play, while the value network astutely evaluates the position that is reached. Finally, AlphaGo chooses the move that is most successful in simulation.

In the above paragraph you’ll note reference to two deep neural networks: the “policy network” and the “value network.” Deep neural networks allow computers to “learn” from repeated exposure to observational data, and in this particular case, the two networks work together to produce a final decision about a move. In simplest form, the policy network suggests the available options, and the value network evaluates the potential outcome. AlphaGo then “chooses the move that is most successful in simulation.”

“Training” For The Big Event

AlphaGo does this by examining records of 30 million real Go game moves, and analyzing how a human responds to each move. It then plays the game over and over using the data it has amassed, and ultimately, the system actually “learns” how to play the game better through reinforcement learning. Enabling this process is called “training” a network. It’s something we actually teach at Udacity. In our Deep Learning course (which was built with Google as part of our Machine Learning Nanodegree program) you learn how to train a number of different types of networks.

If this all sounds a bit heady, have no fear. Chances are, you’re already reaping the benefits of deep learning in your daily life. If you’ve ever searched for images online, or asked Siri a question and gotten the correct answer, you’ve benefitted from deep learning.

However, while it’s one thing to tell you where the closest free-trade coffee shop is located, it’s quite another thing altogether to win the most complex game in the world against its best player. But AlphaGo has been practicing! And practicing, and practicing, and practicing…

Christopher Watkins
Christopher Watkins
Christopher Watkins is Senior Writer and Chief Words Officer at Udacity. He types on a MacBook or iPad by day, and either an Underwood, Remington, or Royal by night. He carries a Moleskine everywhere.