What does Watson mean?

Watson convincingly beat the best champion Jeopardy! players. The apparent significance of this varies hugely, depending on your background knowledge about the related machine learning, NLP, and search technology. For a random person, this might seem evidence of serious machine intelligence, while for people working on the system itself, it probably seems like a reasonably good assemblage of existing technologies with several twists to make the entire system work.

Above all, I think we should congratulate the people who managed to put together and execute this project—many years of effort by a diverse set of highly skilled people were needed to make this happen. In academia, it’s pretty difficult for one professor to assemble that quantity of talent, and in industry it’s rarely the case that such a capable group has both a worthwhile project and the support needed to pursue something like this for several years before success.

Alina invited me to the Jeopardy watching party at IBM, which was pretty fun, and it gave me a chance to talk to several people, principally Gerry Tesauro (2nd from the right). It’s cool to see people asking for autographs 🙂

I wasn’t surprised to see Watson win. Partly, this is simply because when a big company does a publicity stunt like this, it’s with a pretty solid expectation of victory. Partly, this is because I already knew that computers could answer trivia questions moderately well(*), so the question was just how far this could be improved. Gerry tells me that although Watson’s error rate is still significant, one key element is the ability to estimate with high accuracy when they can answer with high accuracy. Gerry also tells me the Watson papers will be coming out later this summer, with many more details.

What happens next? I don’t expect the project to be shelved like deep blue was, for two reasons. The first is that there is clearly very substantial room for improvement, and the second is that having a natural language question/answering device that can quickly search and respond from large sets of text is obviously valuable. The first means that researchers are interested, and the second that the money to support them can probably be found. The history of textual entailment challenges is another less centralized effort in about the same direction.

In the immediate future (next few years), applications in semi-open domains may become viable, particularly when a question/answer device knows when to answer “I don’t know”. Fully conversational speech recognition working in an open domain should take somewhat longer, because speech recognition software has additional error points, conversational systems aren’t so easy to come by, and in a fully open domain the error rates will be higher. Getting the error rate on questions down to the level that a human with access to the internet has difficulty beating is the tricky challenge which has not yet been addressed. It’s a worthy goal to work towards.

Many people believe in human exceptionalism, so when seeing a computer beat Jeopardy, they are surprised that humans aren’t exceptional there. We should understand that this has happened many times before, with chess and mathematical calculation being two areas where computers now dominate, but which were once thought to be the essence of intelligence by some. Similarly, it is not difficult to imagine automated driving (after all, animals can do it), gross object recognition, etc…

To avert surprise in the future, human exceptionalists should understand what the really hard things for an AI to do are. It’s important to understand that there are various levels of I in AI. A few I think about are:

  1. Animal Intelligence. The ability to understand your place in the world, navigate the world, and accomplish something. Some of these tasks are solved, but many others are not yet. This level implies that routine tasks can be automated. Automated driving, farming, factories, etc…
  2. Turing Test Intelligence. The ability to mimic a typical human well-enough to fool a typical human in open conversation. Watson doesn’t achieve this, but the thrust of the research is in this direction as open domain question answering is probably necessary for this. Nonroutine noncreative tasks might be accomplished by the computer. Think of an automated secretary.
  3. Pandora’s box Intelligence. The ability to efficiently self-program in an open domain so as to continuously improve. At this level human exceptionalism fails, and it is difficult to predict what happens next.

So, serious evidence of (2) or (3) is what I watch for.

(*) About 10 years ago, I had a friend2 on WWTBAM who called the friend for help on a question, who typed the question and multiple choice answers into CMU‘s Zephyr system, where a bot I made queried (question,answer) pairs on Google to discover which had the most web pages. It worked.

ICML & COLT 2010

The papers which interested me most at ICML and COLT 2010 were:

  1. Thomas Walsh, Kaushik Subramanian, Michael Littman and Carlos Diuk Generalizing Apprenticeship Learning across Hypothesis Classes. This paper formalizes and provides algorithms with guarantees for mixed-mode apprenticeship and traditional reinforcement learning algorithms, allowing RL algorithms that perform better than for either setting alone.
  2. István Szita and Csaba Szepesvári Model-based reinforcement learning with nearly tight exploration complexity bounds. This paper and anotherrepresent the frontier of best-known algorithm for Reinforcement Learning in a Markov Decision Process.
  3. James Martens Deep learning via Hessian-free optimization. About a new not-quite-online second order gradient algorithm for learning deep functional structures. Potentially this is very powerful because while people have often talked about end-to-end learning, it has rarely worked in practice.
  4. Chrisoph Sawade, Niels Landwehr, Steffen Bickel. and Tobias Scheffer Active Risk Estimation. When a test set is not known in advance, the model can be used to safely aid test set evaluation using importance weighting techniques. Relative to the paper, placing a lower bound on p(y|x) is probably important in practice.
  5. H. Brendan McMahan and Matthew Streeter Adaptive Bound Optimization for Online Convex Optimization and the almost-same paper John Duchi, Elad Hazan, and Yoram Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. These papers provide tractable online algorithms with regret guarantees over a family of metrics rather than just euclidean metrics. They look pretty useful in practice.
  6. Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella, Active Learning on Trees and Graphs Various subsets of these authors have other papers about actively learning graph-obeying functions which in total provide a good basis for understanding what’s possible and how to learn.

The program chairs for ICML did a wide-ranging survey over participants. The results seem to suggest that participants generally agree with the current ICML process. I expect there is some amount of anchoring effect going on where participants have an apparent preference for the known status quo, although it’s difficult to judge the degree of that. Some survey results which aren’t of that sort are:

  1. 7.7% of reviewers say author feedback changed their mind. It would be interesting to know for which fraction of accepted papers reviewers had their mind changed, but that isn’t there.
  2. 85.4% of authors don’t know if the reviewers read their response, believe they read and ignored it, or believe they didn’t read it. Authors clearly don’t feel like they are communicating with reviewers.
  3. 58.6% support growing the conference with the largest fraction suggesting poster-only papers.
  4. Other conferences attended by the ICML community in order are NIPS, ECML/PKDD, AAAI, IJCAI, AIStats, UAI, KDD, ICDM, COLT, SIGIR, ECAI, EMNLP, CoNLL. This is pretty different from the standard colocation list for ICML. Many possibilities are precluded by scheduling, but AAAI, IJCAI, UAI, KDD, COLT, SIGIR are all serious possibilities some of which haven’t been used much in the past.

My experience with Mark‘s new paper discussion site is generally positive—having comments emailed to interested parties really helps the discussion. There are a few comments that authors haven’t responded to, so if you are an author you might want to sign up to receive comments.

In addition, I was the workshop chair for ICML&COLT this year. My overall impression was that things went reasonably well, with the exception of internet connectivity at Dan Panorama which was a minidisaster courtesy of a broken per-machine authentication system. One of the things I’m particularly happy about was the Learning to Rank Challenge workshop. I think it would be great if ICML can continue to attract new challenge workshops in the future. If anyone else has comments about the workshops, I’d love to hear them.

Netflix Challenge 2 Canceled

The second Netflix prize is canceled due to privacy problems. I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release.

I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.

Yahoo! ML events

Yahoo! is sponsoring two machine learning events that might interest people.

  1. The Key Scientific Challenges program (due March 5) for Machine Learning and Statistics offers $5K (plus bonuses) for graduate students working on a core problem of interest to Y! If you are already working on one of these problems, there is no reason not to submit, and if you aren’t you might want to think about it for next year, as I am confident they all press the boundary of the possible in Machine Learning. There are 7 days left.
  2. The Learning to Rank challenge (due May 31) offers an $8K first prize for the best ranking algorithm on a real (and really used) dataset for search ranking, with presentations at an ICML workshop. Unlike the Netflix competition, there are prizes for 2nd, 3rd, and 4th place, perhaps avoiding the heartbreak the ensemble encountered. If you think you know how to rank, you should give it a try, and we might all learn something. There are 3 months left.