Computer learns to win at GO

Gingerbread Demon · Feb 6, 2016

http://www.9news.com.au/technology/2016/01/29/12/25/google-program-defeats-top-human-player-in-go

Pretty damn awesome. It learned how to win at GO.

Go is a hard game for humans so an AI learning to win is pretty damn impressive.

Yay singularity here we come
Yay technocratic post scarcity economy here we come
Yay merging with AI will become necessary once AI progress outstrips human evolution at a rapid pace resulting in planned obsolescence of the unenhanced human race here we come

Those funny comments aside I think this is really a great achievement in AI. They didn't program it to know GO but it learnt how to win by playing. That's impressive.

intrinsical · Feb 6, 2016

It's actually not that impressive to me. Chess has a lot of rules that heavily restrict how each piece is allowed to move, In Computer Science parlance, the state space of chess moves doesn't grow exponentially huge and it is possible to devise an algorithm to search for good (or best) moves to make .

Go on the other hand only has one rule, you can place your piece on any board position that isn't currently occupied by another piece. The number of legal moves explodes exponentially and makes Chess-like search algorithms impossible as there are way too many possibilities that the algorithm needs to consider. What's needed is a technique that can recognize patterns. And guess what? If you've read my Machine Learning 101 thread, you'll know that modern AI is excellent at recognizing patterns. Deep Learning is especially good at recognizing component patterns such as eyes, noses, wheels, wings in images. The 19x19 Go board can be treated as if it were an image, which means its very possible to get Deep Learning to identify all the common patterns that appear in Go. All we need to do is show the Deep Learning algorithm lots "images" of the Go boards being played by humans. And this is exactly what the guys at Google has done.

You can do this with an arcade game too, for example, to train a Deep Learner that can recognize Mario, mushrooms, bricks, turtles.

However, identifying patterns on a Go board is just the first step. The computer still needs to make a move and for that, Googlers fell back on a very old technique called Reinforcement Learning.

intrinsical · Feb 6, 2016

Reinforcement Learning can best be described as learning from experience and has been traditionally used to train robots to move in its environment. In Reinforcement Learning, the "agent" is first presented with the State of the world. The agent takes an Action from a set of possible actions and is given a Reward. The Reward, typically a numeric value, provides a feedback to the agent on whether the Action it took was a success or a failure. Initially, the agent is completely naive and has no idea which moves are good and which are bad. However as the agent takes more and more actions, the rewards it is given allows it to slowly build up a picture of which actions are good and which are bad.

One of the earliest examples of Reinforcement Learning in pop culture is in the 1983 movie, Wargames. In the movie, the protagonist tries to teach a millitary computer that has control of all the nukes in the States that the best possible action to winning the Global Thermonuclear War against Russia is to not launch a nuke. He does so by getting the computer to play Tic-Tac-Toe repeatedly until the computer gains enough experience to realize there are no winning moves in Tic-Tac-Toe.

[yt] [/yt]

In Tic-Tac-Toe, the "State" of the world is the positions of all the X and O on the board. One possible "State" of Tic-Tac-Toe would be:

XOX
OXX
O

There are two possible actions that can be taken, either place an O at the bottom-center square, or the bottom-right square. If the agent decides to act by placing an O in the bottom-center square, it will be given a negative reward as it has failed to block X from winning the game. If the agent places an O in the bottom-right, it will win the game and is thus given a positive reward. The agent keeps a record of this State-Action-Reward tuple in it's memory bank. In the future, when the agent encounters this exact game state, it will know that taking the bottom-right square is the action to take because in the past, it won the game with the move.

And this is exactly what the Googlers did. The Deep Learning recognizes patterns on the Go board, which is given to the Reinforcement Learner as the State. The Reinforcement Learner then places a piece and learns if it is a good or bad move when it eventually wins or looses the game. By getting the computer to play Go against itself many, many, many times, the Go Reinforcement Learning Agent gradually builds up experience on what are the good moves to make in every situation that can happen in Go.

intrinsical · Feb 6, 2016

So... the ultimate question is this: Will the Go AI beat the world's best player?

My hunch is the AI will loose.

The Go AI's biggest problem is that it is using Reinforcement Learning that learns from past experience. The problem is that it's experience comes primarily from playing against itself. Because of Go's exponential number of moves, I am quite sure there are still huge swaths of Go states that the AI has not experienced sufficiently. The world's best Go player could gain an advantage by making very bad starting moves, forcing the AI into states it has rarely experienced in its plays with itself. And because it isn't a chess-like AI that searches for best possible moves, the Go AI will perform terribly in these low-experience states.

Gingerbread Demon · Feb 7, 2016

OMG thanks Intrinsical. Lots of interesting stuff to read, and about deep learning. Do you work in computer science?

BTW Wargames is an excellent movie.

intrinsical · Feb 7, 2016

I started out as a Computer Science major and later specialized in Artificial Intelligence.

If you are interested in reading more about how Artificial Intelligence/Machine Learning actually works, here's my Machine Learning 101 thread that I wrote over a year ago.

Mr. Adventure · Feb 7, 2016

With AI I used to focus on the "I" the Intelligence and it's hard to say whether a computer will ever achieve that. What I didn't realize until more recently is just how important that the "A" is, and if it's strong enough it doesn't matter. If a computer can analyze data and patterns to determine the answers without actually understanding it in the end it's kind of the same.

Gingerbread Demon · Feb 7, 2016

The thing is people get freaked out about AI's gaining emotions and stuff. Emotions are actually I think a defect of logical thinking. A machine would be logical, even a self thinking AI. Why would it need to develop emotions, which would distract it?

intrinsical · Feb 7, 2016

Coco Pops 1967 said:
The thing is people get freaked out about AI's gaining emotions and stuff. Emotions are actually I think a defect of logical thinking. A machine would be logical, even a self thinking AI. Why would it need to develop emotions, which would distract it?

Actually, I tend to think of emotions as the human's analog to Rewards in the Reinforcement Learning scheme. As in emotions are not some arbitrary useless thing but serves an actual purpose. Happiness, joy, pleasure, laughter, excitement are clearly positive rewards that humans chase after. Sadness, pain and suffering are instinctively avoided as they're negative rewards.

Gingerbread Demon · Feb 7, 2016

intrinsical said:
Actually, I tend to think of emotions as the human's analog to Rewards in the Reinforcement Learning scheme. As in emotions are not some arbitrary useless thing but serves an actual purpose. Happiness, joy, pleasure, laughter, excitement are clearly positive rewards that humans chase after. Sadness, pain and suffering are instinctively avoided as they're negative rewards.

OK I can roll with this ........

But would an AI need emotions?

intrinsical · Feb 8, 2016

It's like I said, there needs to be a mechanism for valuing the situation/environment. In Reinforcement Learning, it's the Reward. In Machine Learning, it's the degree of discrepancy between the AI's prediction vs a set of known examples. In humans, emotions determine how we view the situation and thus ultimately influences our future actions. So from this perspective, AI needs an emotion-like mechanism that helps it make sense of the world.

Gingerbread Demon · Feb 8, 2016

intrinsical said:
It's like I said, there needs to be a mechanism for valuing the situation/environment. In Reinforcement Learning, it's the Reward. In Machine Learning, it's the degree of discrepancy between the AI's prediction vs a set of known examples. In humans, emotions determine how we view the situation and thus ultimately influences our future actions. So from this perspective, AI needs an emotion-like mechanism that helps it make sense of the world.

OK

intrinsical · Mar 18, 2016

Well, I have been proven wrong. Machine Learning + Reinforcement Learning + Predictive Ensemble Learning can indeed be better than a human Go grandmaster.

Gingerbread Demon · Mar 18, 2016

I'm glad someone reposted here. I'm thinking this isn't that great a leap in AI as one might think. All the computer did was win at GO. It didn't venture beyond its programming to play GO. So really is it that big of a deal?

Now if it played GO then wanted to do Chess I'd be impressed.

intrinsical · Mar 21, 2016

After reading some of the articles on wired.com it made me realize I wasn't completely off. As I had stated earlier, the number of possibilities to consider is so enormous that despite being played for centuries, there are still Go moves that no one has ever made. In order to cut down the number of possibilities that have to be considered by AlphaGo, it only consider moves that humans are likely to make. This dramatically cuts down the number of moves that AlphaGo has to consider.

And during the fourth match, Lee Sedol made a move that a human normally would not have made and that messed up AlphaGo's algorithm. Here's the relevant section in Wired's article on the fourth match.

It was Move 78, a “wedge” play in the middle of the board, and it immediately turned the game around. As we found out after the game, AlphaGo made a disastrous play with its very next move, and just minutes later, after analyzing the board position, the machine determined that its chances of winning had suddenly fallen off a cliff.

The next morning, as he walked down the main boulevard in Sejong Daero just down the street from the Four Seasons, I discussed the move with Demis Hassabis, who oversees the DeepMind Lab and was very much the face of AlphaGo during the seven-day match. Hassabis told me that AlphaGo was unprepared for Lee Sedol’s Move 78 because it didn’t think that a human would ever play it. Drawing on its months and months of training, it decided there was a one-in-ten-thousand chance of that happening.

AlphaGo went on to lose that match precisely because it was trained to ignore moves that humans were not likely to make. And thus it had never learned a counter for Lee's move. If Lee had played more unconventional moves (Just like that example of Data vs the Strategma grandmaster I cited weeks ago), he would have had a much higher chance of winning against AlphaGo.

Computer learns to win at GO

Gingerbread Demon

Yelling at the Vorlons

intrinsical

Commodore

intrinsical

Commodore

intrinsical

Commodore

Gingerbread Demon

Yelling at the Vorlons

intrinsical

Commodore

Mr. Adventure

Fleet Admiral

Gingerbread Demon

Yelling at the Vorlons

intrinsical

Commodore

Gingerbread Demon

Yelling at the Vorlons

intrinsical

Commodore

Gingerbread Demon

Yelling at the Vorlons

intrinsical

Commodore

Gingerbread Demon

Yelling at the Vorlons

intrinsical

Commodore

Similar threads

We value your privacy