Watson convincingly beat the best champion Jeopardy! players. The apparent significance of this varies hugely, depending on your background knowledge about the related machine learning, NLP, and search technology. For a random person, this might seem evidence of serious machine intelligence, while for people working on the system itself, it probably seems like a reasonably good assemblage of existing technologies with several twists to make the entire system work.
Above all, I think we should congratulate the people who managed to put together and execute this project—many years of effort by a diverse set of highly skilled people were needed to make this happen. In academia, it’s pretty difficult for one professor to assemble that quantity of talent, and in industry it’s rarely the case that such a capable group has both a worthwhile project and the support needed to pursue something like this for several years before success.
Alina invited me to the Jeopardy watching party at IBM, which was pretty fun, and it gave me a chance to talk to several people, principally Gerry Tesauro (2nd from the right). It’s cool to see people asking for autographs 🙂
I wasn’t surprised to see Watson win. Partly, this is simply because when a big company does a publicity stunt like this, it’s with a pretty solid expectation of victory. Partly, this is because I already knew that computers could answer trivia questions moderately well(*), so the question was just how far this could be improved. Gerry tells me that although Watson’s error rate is still significant, one key element is the ability to estimate with high accuracy when they can answer with high accuracy. Gerry also tells me the Watson papers will be coming out later this summer, with many more details.
What happens next? I don’t expect the project to be shelved like deep blue was, for two reasons. The first is that there is clearly very substantial room for improvement, and the second is that having a natural language question/answering device that can quickly search and respond from large sets of text is obviously valuable. The first means that researchers are interested, and the second that the money to support them can probably be found. The history of textual entailment challenges is another less centralized effort in about the same direction.
In the immediate future (next few years), applications in semi-open domains may become viable, particularly when a question/answer device knows when to answer “I don’t know”. Fully conversational speech recognition working in an open domain should take somewhat longer, because speech recognition software has additional error points, conversational systems aren’t so easy to come by, and in a fully open domain the error rates will be higher. Getting the error rate on questions down to the level that a human with access to the internet has difficulty beating is the tricky challenge which has not yet been addressed. It’s a worthy goal to work towards.
Many people believe in human exceptionalism, so when seeing a computer beat Jeopardy, they are surprised that humans aren’t exceptional there. We should understand that this has happened many times before, with chess and mathematical calculation being two areas where computers now dominate, but which were once thought to be the essence of intelligence by some. Similarly, it is not difficult to imagine automated driving (after all, animals can do it), gross object recognition, etc…
To avert surprise in the future, human exceptionalists should understand what the really hard things for an AI to do are. It’s important to understand that there are various levels of I in AI. A few I think about are:
- Animal Intelligence. The ability to understand your place in the world, navigate the world, and accomplish something. Some of these tasks are solved, but many others are not yet. This level implies that routine tasks can be automated. Automated driving, farming, factories, etc…
- Turing Test Intelligence. The ability to mimic a typical human well-enough to fool a typical human in open conversation. Watson doesn’t achieve this, but the thrust of the research is in this direction as open domain question answering is probably necessary for this. Nonroutine noncreative tasks might be accomplished by the computer. Think of an automated secretary.
- Pandora’s box Intelligence. The ability to efficiently self-program in an open domain so as to continuously improve. At this level human exceptionalism fails, and it is difficult to predict what happens next.
So, serious evidence of (2) or (3) is what I watch for.
(*) About 10 years ago, I had a friend2 on WWTBAM who called the friend for help on a question, who typed the question and multiple choice answers into CMU‘s Zephyr system, where a bot I made queried (question,answer) pairs on Google to discover which had the most web pages. It worked.
Watson is machine learning + big database + souped up chatterbot, great technology not intelligence.
It is of course hotly debated.
I may be mistaken but I believe the most difficult part of Watson is to relate the cryptic questions (or answers) in Jeopardy into something that it can look-up in it’s databases.
The “intelligence” lies there…..everything else is technology and fairly simple in my opinion.
Overall pretty impressive I think.
Did you see the strategy and algorithms used by the program’s designers?
Via http://sciencehouse.wordpress.com/, I didn’t read it yet.
Thanks for addressing the real issues instead of whining about the buzzer like most of us have been. 🙂 (My whining buzzer post is at http://messymatters.com/watson )
But speaking of whining, and apropos to hunch.net, I thought it was a bit lame that Watson knows the historical probabilities of where the Daily Doubles are, updated on the state of the board, even. In other words, it’s doing fancy machine learning on something completely unrelated to question answering. That really calls for a (trivial) change in the game: just use fair randomization for daily double placement!
To clarify my “it’s a bit lame that Watson knows the historical probabilities”: I realize that the humans know those probabilities too. My point is that Watson has the edge in that aspect of the game. And I think even ML nerds will have to agree it’s an uninteresting aspect of the game. Anyway, it’s minor compared to the buzzer aspect but it detracts slightly from the coolness of Watson’s victory.
Anyway, the real conclusion here is that in terms of immediate recall of trivia, Watson is somewhere in between normal humans (it would’ve kicked my ass, buzzer or not) and grandmasters. Which is awesome.
In 2003, we wrote a program to play Who Wants to be a Millionaire that sounds not unlike your Zephyr bot and published a paper about it in UAI. I discussed it with some updated perspective here:
http://blog.oddhead.com/2010/03/07/countdown-to-web-sentience/
Watson’s pretty darn cool even though I’m in the know about the assemblage of technologies that make it possible. Just being able to parse the pronoun out of the “answer” in order to formulate a syntactically well-formed “question” is pretty impressive. I’m guessing the Jeopardy-specific setting, while it seems like subtle wordplay, is so stylized that it makes question answering (answer questioning?) easier rather than harder.
It’s pretty clear where Watson’s falling down — multiple inferential steps. Given the uncertainty in each step, piecing bunches of them together is very dicey. At least that’s why I’m guessing it flubbed the Chicago airport question.
I like the focus on quantifying uncertainty. I’ve always felt this was critical when dealing with such a noisy technology in real applications.
I hope your (convincing) analysis of IBM carrying on with the project is right.
I had missed this follow up news item where a democratic congressman from NJ beat Watson: http://www.cbsnews.com/8301-503544_162-20037706-503544.html