Sadece LitRes`te okuyun

Kitap dosya olarak indirilemez ancak uygulamamız üzerinden veya online olarak web sitemizden okunabilir.

Kitabı oku: «The Creativity Code: How AI is learning to write, paint and think», sayfa 3

Yazı tipi:

First blood

Previous computer programs built to play Go had not come close to playing competitively against even a pretty good amateur, so most pundits were highly sceptical of DeepMind’s dream to create code that could get anywhere near an international champion of the game. Most people still agreed with the view expressed in The New York Times by the astrophysicist Piet Hut after DeepBlue’s success at chess in 1997: ‘It may be a hundred years before a computer beats humans at Go – maybe even longer. If a reasonably intelligent person learned to play Go, in a few months he could beat all existing computer programs. You don’t have to be a Kasparov.’

Just two decades into that hundred years, the DeepMind team believed they might have cracked the code. Their strategy of getting algorithms to learn and adapt appeared to be working, but they were unsure quite how powerful the emerging algorithm really was. So in October 2015 they decided to test-run their program in a secret competition against the current European champion, the Chinese-born Fan Hui.

AlphaGo destroyed Fan Hui five games to nil. But the gulf between European players of the game and those in the Far East is huge. The top European players, when put in a global league, rank in the 600s. So, although it was still an impressive achievement, it was like building a driverless car that could beat a human driving a Ford Fiesta round Silverstone then trying to challenge Lewis Hamilton in a Grand Prix.

Certainly when the press in the Far East heard about Fan Hui’s defeat they were merciless in their dismissal of how meaningless the win was for AlphaGo. Indeed, when Fan Hui’s wife contacted him in London after the news got out, she begged her husband not to go online. Needless to say he couldn’t resist. It was not a pleasant experience to read how dismissive the commentators in his home country were of his credentials to challenge AlphaGo.

Fan Hui credits his matches with AlphaGo with teaching him new insights into how to play the game. In the following months his ranking went from 633 to the 300s. But it wasn’t only Fan Hui who was learning. Every game AlphaGo plays affects its code and changes it to improve its play next time around.

It was at this point that the DeepMind team felt confident enough to offer their challenge to Lee Sedol, South Korea’s eighteen-time world champion and a formidable player of the game.

The match was to be played over five games scheduled between 9 and 15 March 2016 at the Four Seasons hotel in Seoul, and would be broadcast live across the internet. The winner would receive a prize of a million dollars. Although the venue was public, the precise location within the hotel was kept secret and was isolated from noise – not that AlphaGo was going to be disturbed by the chitchat of the press and the whispers of curious bystanders. It would assume a perfect Zen-like state of concentration wherever it was placed.

Sedol wasn’t fazed by the news that he was up against a machine that had beaten Fan Hui. Following Fan Hui’s loss he had declared: ‘Based on its level seen … I think I will win the game by a near landslide.’

Although he was aware of the fact that the machine he would be playing was learning and evolving, this did not concern him. But as the match approached, you could hear doubts beginning to creep into his view of whether AI will ultimately be too powerful for humans to defeat it even in the game of Go. In February he stated: ‘I have heard that DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win … at least this time.’

Most people still felt that despite great inroads into programming, an AI Go champion was still a distant goal. Rémi Coulom, the creator of Crazy Stone, the only program to get close to playing Go at any high standard, was still predicting another decade before computers would beat the best humans at the game.

As the date for the match approached, the team at DeepMind felt they needed someone to really stretch AlphaGo and to test it for any weaknesses. So they invited Fan Hui back to play the machine going into the last few weeks. Despite having suffered a 5–0 defeat and being humiliated by the press back in China, he was keen to help out. Perhaps a bit of him felt that if he could help make AlphaGo good enough to beat Sedol, it would make his defeat less humiliating.

As Fan Hui played he could see that AlphaGo was extremely strong in some areas but he managed to reveal a weakness that the team was not aware of. There were certain configurations in which it seemed to completely fail to assess who had control of the game, often becoming totally delusional that it was winning when the opposite was true. If Sedol tapped into this weakness, AlphaGo wouldn’t just lose, it would appear extremely stupid.

The DeepMind team worked around the clock trying to fix this blind spot. Eventually they just had to lock down the code as it was. It was time to ship the laptop they were using to Seoul.

The stage was set for a fascinating duel as the players, or at least one player, sat down on 9 March to play the first of the five games.

‘Beautiful. Beautiful. Beautiful’

It was with a sense of existential anxiety that I fired up the YouTube channel broadcasting the matches that Sedol would play against AlphaGo and joined 280 million other viewers to see humanity take on the machines. Having for years compared creating mathematics to playing the game of Go, I had a lot on the line.

Lee Sedol picked up a black stone and placed it on the board and then waited for the response. Aja Huang, a member of the DeepMind team, would play the physical moves for AlphaGo. This, after all, was not a test of robotics but of artificial intelligence. Huang stared at AlphaGo’s screen, waiting for its response to Sedol’s first stone. But nothing came.

We all stared at our screens wondering if the program had crashed! The DeepMind team was also beginning to wonder what was up. The opening moves are generally something of a formality. No human would think so long over move 2. After all, there was nothing really to go on yet. What was happening? And then a white stone appeared on the computer screen. It had made its move. The DeepMind team breathed a huge sigh of relief. We were off! Over the next couple of hours the stones began to build up across the board.

One of the problems I had as I watched the game was assessing who was winning at any given point in the game. It turns out that this isn’t just because I’m not a very experienced Go player. It is a characteristic of the game. Indeed, this is one of the main reasons why programming a computer to play Go is so hard. There isn’t an easy way to turn the current state of the game into a robust scoring system of who leads by how much.

Chess, by contrast, is much easier to score as you play. Each piece has a different numerical value which gives you a simple first approximation of who is winning. Chess is destructive. One by one pieces are removed so the state of the board simplifies as the game proceeds. But Go increases in complexity as you play. It is constructive. The commentators kept up a steady stream of observations but struggled to say if anyone was in the lead right up until the final moments of the game.

What they were able to pick up quite quickly was Sedol’s opening strategy. If AlphaGo had learned to play on games that had been played in the past, then Sedol was working on the principle that it would put him at an advantage if he disrupted the expectations it had built up by playing moves that were not in the conventional repertoire. The trouble was that this required Sedol to play an unconventional game – one that was not his own.

It was a good idea but it didn’t work. Any conventional machine programmed on a database of accepted openings wouldn’t have known how to respond and would most likely have made a move that would have serious consequences in the grand arc of the game. But AlphaGo was not a conventional machine. It could assess the new moves and determine a good response based on what it had learned over the course of its many games. As David Silver, the lead programmer on AlphaGo, explained in the lead-up to the match: ‘AlphaGo learned to discover new strategies for itself, by playing millions of games between its neural networks, against themselves, and gradually improving.’ If anything, Sedol had put himself at a disadvantage by playing a game that was not his own.

As I watched I couldn’t help feeling for Sedol. You could see his confidence draining out of him as it gradually dawned on him that he was losing. He kept looking over at Huang, the DeepMind representative who was playing AlphaGo’s moves, but there was nothing he could glean from Huang’s face. By move 186 Sedol had to recognise that there was no way to overturn the advantage AlphaGo had built up on the board. He placed a stone on the side of the board to indicate his resignation.

By the end of day one it was: AlphaGo 1 Humans 0. Sedol admitted at the press conference that day: ‘I was very surprised because I didn’t think I would lose.’

But it was game 2 that was going to truly shock not just Sedol but every human player of the game of Go. The first game was one that experts could follow and appreciate why AlphaGo was playing the moves it was. They were moves a human champion would play. But as I watched game 2 on my laptop at home, something rather strange happened. Sedol played move 36 and then retired to the roof of the hotel for a cigarette break. While he was away, AlphaGo on move 37 instructed Huang, its human representative, to place a black stone on the line five steps in from the edge of the board. Everyone was shocked.

The conventional wisdom is that during the early part of the game you play stones on the outer four lines. The third line builds up short-term territory strength on the edge of the board while playing on the fourth line contributes to your strength later in the game as you move into the centre of the board. Players have always found that there is a fine balance between playing on the third and fourth lines. Playing on the fifth line has always been regarded as suboptimal, giving your opponent the chance to build up territory that has both short- and long-term influence.

AlphaGo had broken this orthodoxy built up over centuries of competing. Some commentators declared it a clear mistake. Others were more cautious. Everyone was intrigued to see what Sedol would make of the move when he returned from his cigarette break. As he sat down, you could see him physically flinch as he took in the new stone on the board. He was certainly as shocked as all of the rest of us by the move. He sat there thinking for over twelve minutes. Like chess, the game was being played under time constraints. Using twelve minutes of your time was very costly. It is a mark of how surprising this move was that it took Sedol so long to respond. He could not understand what AlphaGo was doing. Why had the program abandoned the region of stones they were competing over?

Was this a mistake by AlphaGo? Or did it see something deep inside the game that humans were missing? Fan Hui, who had been given the role of one of the referees, looked down on the board. His initial reaction matched everyone else’s: shock. And then he began to realise: ‘It’s not a human move. I’ve never seen a human play this move,’ he said. ‘So beautiful. Beautiful. Beautiful. Beautiful.’

Beautiful and deadly it turned out to be. Not a mistake but an extraordinarily insightful move. Some fifty moves later, as the black and white stones fought over territory from the lower left-hand corner of the board, they found themselves creeping towards the black stone of move 37. It was joining up with this stone that gave AlphaGo the edge, allowing it to clock up its second win. AlphaGo 2 Humans 0.

Sedol’s mood in the press conference that followed was notably different. ‘Yesterday I was surprised. But today I am speechless … I am in shock. I can admit that … the third game is not going to be easy for me.’ The match was being played over five games. This was the game that Sedol needed to win to be able to stop AlphaGo claiming the match.

The human fight-back

Sedol had a day off to recover. The third game would be played on Saturday, 12 March. He needed the rest, unlike the machine. The first game had been over three hours of intense concentration. The second lasted over four hours. You could see the emotional toll that losing two games in a row was having on him.

Rather than resting, though, Sedol stayed up till 6 a.m. the next morning analysing the games he’d lost so far with a group of fellow professional Go players. Did AlphaGo have a weakness they could exploit? The machine wasn’t the only one who could learn and evolve. Sedol felt he might learn something from his losses.

Sedol played a very strong opening to game 3, forcing AlphaGo to manage a weak group of stones within his sphere of influence on the board. Commentators began to get excited. Some said Sedol had found AlphaGo’s weakness. But then, as one commentator posted: ‘Things began to get scary. As I watched the game unfold and the realisation of what was happening dawned on me, I felt physically unwell.’

Sedol pushed AlphaGo to its limits but in so doing he revealed the hidden powers that the program seemed to possess. As the game proceeded, it started to make what commentators called lazy moves. It had analysed its position and was so confident in its win that it chose safe moves. It didn’t care if it won by half a point. All that mattered was that it won. To play such lazy moves was almost an affront to Sedol, but AlphaGo was not programmed with any vindictive qualities. Its sole goal was to win the game. Sedol pushed this way and that, determined not to give in too quickly. Perhaps one of these lazy moves was a mistake that he could exploit.

By move 176 Sedol eventually caved in and resigned. AlphaGo 3 Humans 0. AlphaGo had won the match. Backstage, the DeepMind team was going through a strange range of emotions. They’d won the match, but seeing the devastating effect it was having on Sedol made it hard for them to rejoice. The million-dollar prize was theirs. They’d already decided to donate the prize, if they won, to a range of charities dedicated to promoting Go and science subjects as well as to Unicef. Yet their human code was causing them to empathise with Sedol’s pain.

AlphaGo did not demonstrate any emotional response to its win. No little surge of electrical current. No code spat out with a resounding ‘YES!’ It is this lack of response that gives humanity hope and is also scary at the same time. Hope because it is this emotional response that is the drive to be creative and venture into the unknown: it was humans, after all, who’d programmed AlphaGo with the goal of winning. Scary because the machine won’t care if the goal turns out to be not quite what its programmers had intended.

Sedol was devastated. He came out in the press conference and apologised:

I don’t know how to start or what to say today, but I think I would have to express my apologies first. I should have shown a better result, a better outcome, and better content in terms of the game played, and I do apologize for not being able to satisfy a lot of people’s expectations. I kind of felt powerless.

But he urged people to keep watching the final two games. His goal now was to try to at least get one back for humanity.

Having lost the match, Sedol started game 4 playing far more freely. It was as if the heavy burden of expectation had been lifted, allowing him to enjoy his game. In sharp contrast to the careful, almost cautious play of game 3, he launched into a much more extreme strategy called ‘amashi’. One commentator compared it to a city investor who, rather than squirrelling away small gains that accumulate over time, bet the whole bank.

Sedol and his team had stayed up all of Saturday night trying to reverse-engineer from AlphaGo’s games how it played. It seemed to work on a principle of playing moves that incrementally increase its probability of winning rather than betting on the potential outcome of a complicated single move. Sedol had witnessed this when AlphaGo preferred lazy moves to win game 3. The strategy they’d come up with was to disrupt this sensible play by playing the risky single moves. An all-or-nothing strategy might make it harder for AlphaGo to score so easily.

AlphaGo seemed unfazed by this line of attack. Seventy moves into the game, commentators were already beginning to see that AlphaGo had once again gained the upper hand. This was confirmed by a set of conservative moves that were AlphaGo’s signal that it had the lead. Sedol had to come up with something special if he was going to regain the momentum.

If move 37 of game 2 was AlphaGo’s moment of creative genius, move 78 of game 4 was Sedol’s retort. He’d sat there for thirty minutes staring at the board, staring at defeat, when he suddenly placed a white stone in an unusual position, between two of AlphaGo’s black stones. Michael Redmond, who was commentating on the YouTube channel, spoke for everyone: ‘It took me by surprise. I’m sure that it would take most opponents by surprise. I think it took AlphaGo by surprise.’

It certainly seemed to. AlphaGo appeared to completely ignore the play, responding with a strange move. Within several more moves AlphaGo could see that it was losing. The DeepMind team stared at their screens behind the scenes and watched their creation imploding. It was as if move 78 short-circuited the program. It seemed to cause AlphaGo to go into meltdown as it made a whole sequence of destructive moves. This apparently is another characteristic of the way Go algorithms are programmed. Once they see that they are losing they go rather crazy.

Silver, the chief programmer, winced as he saw the next move AlphaGo was suggesting: ‘I think they’re going to laugh.’ Sure enough, the Korean commentators collapsed into fits of giggles at the moves AlphaGo was now making. Its moves were failing the Turing Test. No human with a shred of strategic sense would make them. The game dragged on for a total of 180 moves, at which point AlphaGo put up a message on the screen that it had resigned. The press room erupted with spontaneous applause.

The human race had got one back. AlphaGo 3 Humans 1. The smile on Lee Sedol’s face at the press conference that evening said it all. ‘This win is so valuable that I wouldn’t exchange it for anything in the world.’ The press cheered wildly. ‘It’s because of the cheers and the encouragement that you all have shown me.’

Gu Li, who was commentating on the game in China, declared Sedol’s move 78 as the ‘hand of god’. It was a move that broke the conventional way to play the game and that was ultimately the key to its shocking impact. Yet this is characteristic of true human creativity. It is a good example of Boden’s transformational creativity, whereby breaking out of the system you can find new insights.

At the press conference, Hassabis and Silver could not explain why AlphaGo had lost. They would need to go back and analyse why it had made such a lousy move in response to Sedol’s move 78. It turned out that AlphaGo’s experience in playing humans had led it to totally dismiss such a move as something not worth thinking about. It had assessed that this was a move that had only a one in 10,000 chance of being played. It seems as if it just had not bothered to learn a response to such a move because it had prioritised other moves as more likely and therefore more worthy of response.

Perhaps Sedol just needed to get to know his opponent. Perhaps over a longer match he would have turned the tables on AlphaGo. Could he maintain the momentum into the fifth and final game? Losing 3–2 would be very different from 4–1. The last game was still worth competing in. If he could win a second game, then it would sow seeds of doubt about whether AlphaGo could sustain its superiority.

But AlphaGo had learned something valuable from its loss. You play Sedol’s one in 10,000 move now against the algorithm and you won’t get away with it. That’s the power of this sort of algorithm. It learns from its mistakes.

That’s not to say it can’t make new mistakes. As game 5 proceeded, there was a moment quite early on when AlphaGo seemed to completely miss a standard set of moves in response to a particular configuration that was building. As Hassabis tweeted from backstage: ‘#AlphaGo made a bad mistake early in the game (it didn’t know a known tesuji) but now it is trying hard to claw it back … nail-biting.’

Sedol was in the lead at this stage. It was game on. Gradually AlphaGo did claw back. But right up to the end the DeepMind team was not exactly sure whether it was winning. Finally, on move 281 – after five hours of play – Sedol resigned. This time there were cheers backstage. Hassabis punched the air. Hugs and high fives were shared across the team. The win that Sedol had pulled off in game 4 had suddenly re-engaged their competitive spirit. It was important for them not to lose this last game.

Looking back at the match, many recognise what an extraordinary moment this was. Some immediately commented on its being an inflexion point for AI. Sure, all this machine could do was play a board game, and yet, for those looking on, its capability to learn and adapt was something quite new. Hassabis’s tweet after winning the first game summed up the achievement: ‘#AlphaGo WINS!!!! We landed it on the moon.’ It was a good comparison. Landing on the moon did not yield extraordinary new insights about the universe, but the technology that we developed to achieve such a feat has. Following the last game, AlphaGo was awarded an honorary professional 9 dan rank by the South Korean Go Association, the highest accolade for a Go player.