Congratulations are in order for the folks at Google Deepmind who have mastered Go.
However, some of the discussion around this seems like giddy overstatement. Wired says Machines have conquered the last games and Slashdot says We know now that we don’t need any big new breakthroughs to get to true AI. The truth is nowhere close.
For Go itself, it’s been well-known for a decade that Monte Carlo tree search (i.e. valuation by assuming randomized playout) is unusually effective in Go. Given this, it’s unclear that the AlphaGo algorithm extends to other board games where MCTS does not work so well. Maybe? It will be interesting to see.
Delving into existing computer games, the Atari results (see figure 3) are very fun but obviously unimpressive on about ¼ of the games. My hypothesis for why is that their solution does only local (epsilon-greedy style) exploration rather than global exploration so they can only learn policies addressing either very short credit assignment problems or with greedily accessible polices. Global exploration strategies are known to result in exponentially more efficient strategies in general for deterministic decision process(1993), Markov Decision Processes (1998), and for MDPs without modeling (2006).
The reason these strategies are not used is because they are based on tabular learning rather than function fitting. That’s why I shifted to Contextual Bandit research after the 2006 paper. We’ve learned quite a bit there, enough to start tackling a Contextual Deterministic Decision Process, but that solution is still far from practical. Addressing global exploration effectively is only one of the significant challenges between what is well known now and what needs to be addressed for what I would consider a real AI.
This is generally understood by people working on these techniques but seems to be getting lost in translation to public news reports. That’s dangerous because it leads to disappointment. The field will be better off without an overpromise/bust cycle so I would encourage people to keep and inform a balanced view of successes and their extent. Mastering Go is a great accomplishment, but it is quite far from everything.
Opencog has got interesting synergetic architecture where you could fit all these sort of algorithm in synergy.
Translated: John is throwing a hissy fit that his research doesn’t get as much media attention as solving Go did. Please give John more attention.
I’m not interested in trolling. Feel free to post again with a real argument.
Not even an argument, just random name-calling. Ie, “John is a petulant, bitter attention whore.” Put that way it’s obvious that if anyone it’s the troll that that describes. I vote for silent deletion of trollery (and these replies along with it; scorched earth!).
So you write in your blog (which is obviously targeted towards people “in the know”) and complain about perceptions people who don’t read your blog have. In other words, you’re complaining to people who already “in the know”. What do you think some of these readers will think? Obviously they’ll feel it’s sour grapes — because your actual points are ignored (since the readers already understand and know them), and instead focuses on your negative sentiment as plain jealousy. What DID you think your blog post would accomplish?
I think speaking up against overhyping is really valuable. John’s last paragraph explains why. Maybe John sounded too dismissive of this milestone? I’m certain his praise (“congratulations”, “very fun”, “great accomplishment”) is quite sincere. But it’s not close to AGI and it’s important for machine learning researchers to be clear about that when even nerd-focused media like Wired and Slashdot lead people to believe otherwise.
I think that there are people like me (undergrads who read/skim CS blogs to get whatever they can out of it) for whom posts like this are actually quite valuable.
Also, note that Go has not been “mastered” yet. Lee Sedol won game 4 of the match against AlphaGo and after getting in trouble, AlphaGo played some laughable moves that no human beginner would, indicating a kind of fragility that still needs to be addressed in the face of adverse situations.
Thanks John! Well-put, the hysteria has been confusing me.
Curious if women find this as “meaningful” as men do, my own data points suggest not.
I personally don’t see the point in getting so excited while we’re still using heavy supervision.
But you know (I hope) what I prioritize in AI…=)
Hi John,
I agree that the major problem in RL is dealing with global function approximation and sensor aliasing (i.e. partial observability). If you add multiple agents to this, there is almost always some kind of breakdown. Most of the hacks employed by practitioners (e.g. experience replay) provide a rather unprincipled fix to these problems (and quite often don’t work at all). I am not convinced however that the solution will come through contextual bandits – and not some form of NN research (e.g. methods to help with catastrophic forgetting). If the research in GO and MCTS is anything to go by, the “very principled” approach (i.e. UCB) almost always underperformed in favour of more ad-hoc approaches – the general idea of exploring and exploiting is what was really kept.
The problem that needs addressing in my view is that one needs to “forget” Q-values as you keep on learning, while at the same time trying to hold on to Q-values that you consider as learned. I am not sure how this can be framed in a more formal setting and attacked.
Most of the theory that I’ve worked on is agnostic to representation, hence it is not “theory or neural networks”, but rather neural networks are one way to instantiate the theory. With that said, I tend to agree. Theory typically can’t handle all the peculiarities of a real application so to be relevant it must be flexibly applied.
It is worth noting that the MCTS approach AlphaGo is using is a distributed variant of UCT which in turn was an extension of UCB to game-tree search. And I don’t think the only thing that was kept from UCB was the explore-exploit aspect – the scheme of updates is close to the UCB version (although functionally they don’t seem to use the log term in the bonus term but a simple constant if I correctly read their Nature paper). Hence a principled approach (UCB) did give rise to a path of improvement towards a practical algorithm.
Those modern brute-force approaches masked as AI, are misleading, marketing campaings at full. What a shame, All of those magazines (and wikipedia) talking about artificial general intelligence,…. but that brick just plays Go. Google has so much money…
There is lots to be “giddy” about though. Consider the massive depth of the neural net. Also consider the way in which most of the training took place ie “self play”.
There are issues to be resolved – but it seems pretty clear that the system is unbeatable, even in its current state, for most mortals. That’s something to celebrate.
It’s fair to say that over-hype is bad. But some hype is in order.
One thing I see of interest is the advancement of the concept of machine based intuition. When you ask an experienced Go player, why here and not here? The analysis is so abstract that the best answer is that it just “feels” correct. What is monumental is that this system demonstrates the ability to come up with intuitive assessments of abstract spaces much like a human. The obvious extension is to other areas that require intuition (like art, or judgement). The algorithms used in the deep mind system are generic neural nets with reinforcement. These tools have already been applied in many fields and we now have a really good idea about how to extend them.
Its not the end of AI research. But its an important step in the progression of machine intelligence. And the public should know about it.
Not to mention the case where the appropriate reward function is not even well-defined (such as real life).
Hi John,
Overall a great, level-headed assessment of both the promise of deep reinforcement learning and the need to contextualize this achievement with some degree of sobriety.
To be fair regarding the claims about the media perception of this achievement, the Wired article by Cade Metz was actually quite reasonable, especially by standards of the mainstream press. It focused on the game of Go and the beyond-board-game claims are quite tame:
“Considering that many of the machine learning technologies at the heart of AlphaGo are already running services inside some of the world’s largest Internet companies, the victory shows how quickly AI will progress in the years to come.”
The most cringeworthy moment (for me) is always the obligatory overstatement of a connection to biology:
“—vast networks of hardware and software that mimic the web of neurons in the human brain—”
Still, I think it’s far from an archetypal example of sensationalized AI press.
Similarly, I’d also like to temper statement “Slashdot says We know now that we don’t need any big new breakthroughs to get to true AI.” If I read the link correctly, this is just one comment by some random guy on Slashdot, hardly indicative of general belief that this is true.
Cheers,
Zack
Zack, I know of no large internet company using either MCTS or deep Q learning. So that paragraph is not “tame”, it is false.
Maybe you could try ‘deep learning’, then it is used inside google extensively. I have heard translate would also be using deep learning in future. So the statement could be partially true.
We can score AlphaGo on multiple dimensions of Artificial Intelligence-ness.
1) Accuracy & Task Generalization. The best we have at the moment. AlphaGo was not modelled to beat Lee Sedol, but to beat any world-class Go player.
2) Flexibility. AlphaGo’s framework is very flexible to other tasks: from Atari games to Go, from self-driving cars to autonomous robots. I feel the MC tree search is a bit of a red herring here, merely a tool used for computation/faster search.
3) Adaptiveness. So-so. I did not get the feel that AlphaGo continuously adapts to the opponents play. But it was part of the rules that AlphaGo would not be updated between/(during(?)) matches. It would also be a very natural (and attainable) extension.
4) Hardware Robustness & Graceful degradation. Very robust, since entirely distributed and fault-tolerant. Loss of one GPU does not cause incapability to function well.
5) Software Robustness. Mediocre, since AlphaGo was not capable to respond in time to certain opponent moves. It could also not recover from certain suboptimal moves it itself made.
6) Introspection & Communication of Reasoning. It is able to show its confidence for moves, and how it influences the value of longer term positions. It is not (yet) able to communicate “canonical” board patterns to summarize a position, or motivate its moves in a way humans would (though sometimes humans can not either).
7) Human-like intelligence. Perhaps only in resulting behaviour. AlphaGo plays like a child who was locked into a room with 100.000 amateur Go games, then played millions of games against itself. With perfect memory-loss for every turn.
8) Resource economy. Just how complex a machine is AlphaGo, compared to the human brain (or the LHC)? How much energy was needed to train AlphaGo and how much energy is needed to keep it running? What is the compression ratio of the information contained in AlphaGo? All interesting questions that I can not answer (or compare to humans).
9) Charisma, Willpower & Persuasion. Is a machine intelligent if there is no one around willing to listen to it? AlphaGo has phenomenal PR. This is all man-made and tacked on, but still, AlphaGo’s decisions has a lot of power over a lot of smart people. She is a rock star.
10) Speed of learning. I’d say poorly, but I don’t know the progressive performance increase (perhaps the last 5% of Go is the hardest to master, and learning to play Go at amateur level is relatively fast).
11) Prior knowledge. How many Go games did AlphaGo need to start playing Go at world-class level? I believe there was no opening book, and AlphaGo was seeded with a mere 100k amateur-level games. I think AlphaGo scores very high on this dimension.
12) World model/reward system. Still early stage, by their own admission. What actions are worth spending computational resources on? I am not knowledgeable enough to vote for or against your global exploration hypothesis.
Fairly elaborate, so we may use a hashing function which maps to a smaller dimensionality.
As for the science vs. the press, we have to realize (and accept) that Google is a company. For science, this means that, while beating state-of-the-academic-art, reproducability often takes a hit. For the press, this means that Google (Deepmind) may be doing very cool cutting-edge stuff with ads, but what sticks (and gets promoted) are the Youtube Cat NN’s, winning Jeopardy, Minecraft agents, or beating Go.
I was hoping to see a few cites in the Nature paper from people I know worked at similar problems for decades, and who I greatly admire, but was a bit disappointed. But imagine being a relatively unknown researcher and having your paper or master thesis appear as a cite in that paper… That’s very nice too.
If there is a blame of hype, I don’t blame the engineers and researchers who worked on AlphaGo. You can not blame them for their enthusiasm and passion with which they tackled Go. In my case, blame is rather futile, since I can not change media anyway, but someone with Langford’s or Lecun’s stature certainly can sway discussion or put on the brakes.
Are we risking another AI winter? I do see homogeneity (not enough diversity and variance in approach, saturation of deep learning), centralization (hierarchical top-down bureaucratic academia), division (http://hunch.net/?p=224), imitation (researchers following along and copying the hype-du-jour), emotionality (“AlphaGo has solved AI forever”).
Nonetheless, what AlphaGo accomplished was a very impressive feat, and I think all-in-all, a good and exciting evolution for the field of reinforcement learning and MCMC decision theory.
As you may already know, Deep Learning is good for recognition. i.e. image recognition.
On the other hand, for “real AI” I think consciousness is necessary:
http://mambo-bab.hatenadiary.jp/entry/2014/12/09/005711
Thoughtful post . I loved the details – Does someone know if my company would be able to get ahold of a sample AU Form 1196S document to fill out ?