We are planning to have a workshop on atomic learning Jan 7 & 8 at TTI-Chicago.
The earlier request for interest is here.
The primary deadline is abstracts due Nov. 20 to jl@tti-c.org.
Machine learning and learning theory research
We are planning to have a workshop on atomic learning Jan 7 & 8 at TTI-Chicago.
The earlier request for interest is here.
The primary deadline is abstracts due Nov. 20 to jl@tti-c.org.
Several people have had difficulty with comments which seem to have an allowed language significantly poorer than posts. The set of allowed html tags has been increased and the markdown filter has been put in place to try to make commenting easier. I’ll put some examples into the comments of this post.
I attended the IBM research 60th anniversary. IBM research is, by any reasonable account, the industrial research lab which has managed to bring the most value to it’s parent company over the long term. This can be seen by simply counting the survivors: IBM research is the only older research lab which has not gone through a period of massive firing. (Note that there are also new research labs.)
Despite this impressive record, IBM research has failed, by far, to achieve it’s potential. Examples which came up in this meeting include:
And remember … IBM research is a stark success story compared to it’s competitors.
Why is there such a pervasive failure to recognize the really big things when they come along in a research lab? This is a huge by capitolization standards: several ideas created in IBM labs have resulted in companies with a stock market valuation greater than IBM. This is also of fundamental concern to researchers everywhere, because that failure is much of the reason why research is chronically underfunded.
A reasonable argument is “it’s much harder to predict the big ones in advance than in hindsight”. This is certainly true, but I don’t believe that is a full explanation. Too many ideas have succeeded because someone else outside of the orignal company recognized the value of the idea and exploited it. There is no fundamental reason why VCs should have an inherent advantage over the company running a research lab at recognizing good ideas within it’s own research lab.
Muthu Muthukrishnan says “it’s the incentives”. In particular, people who invent something within a research lab have little personal incentive in seeing it’s potential realized so they fail to pursue it as vigorously as they might in a startup setting. This is a very reasonable argument: incentives for success at a research lab are typically a low percentage of base salary while startup founders have been known to become billionaires.
A third argument is that researchers at a research lab are likely to find new and better ways of doing things that the company already does. This creates a problem because a large fraction of the company is specifically invested in the current-to-become-old way of doing things. When a debate happens, it is always easy to find the faults and drawbacks of the new method while ignoring the accepted faults of the old (this is common to new directions in research as well).
None of these three reasons are fundamentally unremovable, and it seems plausible that the first industrial research lab to remove these obstacles will reap huge benefits. My current candidates for ‘most likely to remove barriers’ are google and NICTA.
“Search” is the other branch of AI research which has been succesful. Concrete examples include Deep Blue which beat the world chess champion and Chinook the champion checkers program. A set of core search techniques exist including A*, alpha-beta pruning, and others that can be applied to any of many different search problems.
Given this, it may be surprising to learn that there has been relatively little succesful work on combining prediction and search. Given also that humans typically solve search problems using a number of predictive heuristics to narrow in on a solution, we might be surprised again. However, the big successful search-based systems have typically not used “smart” search algorithms. Insteady they have optimized for very fast search. This is not for lack of trying… many people have tried to synthesize search and prediction to various degrees of success. For example, Knightcap achieves good-but-not-stellar chess playing performance, and TD-gammon has achieved near-optimal Backgammon play.
The basic claim of this post is that we will see more and stronger successes of the sort exemplified by Knightcap and TD-gammon and less of those exemplified by the “pure” search techniques. Here are two reasons:
These two points of excellence are precisely what search algorithms typically require for good performance which partially explains why the human-preferred method of search is so different from the computer-preferred method. The advantages of humans have traditionally been:
Putting these observations together, we see that computers are becoming more ’rounded’ in their strengths which suggests highly parallelizable algorithms using large quantities of memory may become more viable solutions for search problems. These qualities of ‘highly parallelizable’ and ‘incorporating long term memory’ characterize many of our prediction algorithms.
In the beginning, on the applied side, learning was simply understood at a ‘type’ level with a relatively sparse set of types such as ‘binary classification’ and ‘multiclass classification’. Evidence of this can be seen in the UCI machine learning repository where most learning problems are ‘projected’ (via throwing away extra information) into binary or multiclass prediction even when the original problem was significantly different.
What we have learned since then is that it is critical to consider:
At the theoretical level, it is difficult to even think about learning problems without considering some natural distribution over the instances we need to predict. At the practical level, there are many cases where people have enjoyed grealy improved success simply by using the natural distribution. This occurs repeatedly in the learning reductions analysis (an exemplar is here), but this core observation is common elsewhere as well.
The basic claim here is that understanding these two concepts is key to mixing search and prediction. Considering the ‘natural distribution’ is important because search systems create their own distribution over instances. This property of ‘self-creating distribution’ implies that it is easy to misdesign a combined learning/search system. For example, it is easy to train the learning algorithm on samples drawn from the wrong process.
The value of searching in some direction vs. another direction is not easily described as a simple classification problem. Consequently, projecting away this information, as might have been done a decade ago, can result in a large performance loss.
The DARPA grandchallenge is a big contest for autonomous robot vehicle driving. It was run once in 2004 for the first time and all teams did badly. This year was notably different with the Stanford and CMU teams succesfully completing the course. A number of details are here and wikipedia has continuing coverage.
A formal winner hasn’t been declared yet although Stanford completed the course quickest.
The Stanford and CMU teams deserve a large round of applause as they have strongly demonstrated the feasibility of autonomous vehicles.
The good news for machine learning is that the Stanford team (at least) is using some machine learning techniques.