Mark Reid did a post on ICML trends that I found interesting.
Idempotent-capable Predictors
One way to distinguish different learning algorithms is by their ability or inability to easily use an input variable as the predicted output. This is desirable for at least two reasons:
- Modularity If we want to build complex learning systems via reuse of a subsystem, it’s important to have compatible I/O.
- “Prior” knowledge Machine learning is often applied in situations where we do have some knowledge of what the right solution is, often in the form of an existing system. In such situations, it’s good to start with a learning algorithm that can be at least as good as any existing system.
When doing classification, most learning algorithms can do this. For example, a decision tree can split on a feature, and then classify. The real differences come up when we attempt regression. Many of the algorithms we know and commonly use are not idempotent predictors.
- Logistic regressors can not be idempotent, because all input features are mapped through a nonlinearity.
- Linear regressors can be idempotent—they just set the weight on one input feature to 1 and other features to 0.
- Regression trees are not idempotent, or (at least) not easily idempotent. In order to predict the same as an input feature, that input feature must be split many times.
- Bayesian approaches may or may not be easily idempotent, depending on the structure of the Bayesian Prior.
It isn’t clear how important the idempotent-capable property is. Successive approximation approaches such as boosting can approximate it in a fairly automatic maner. It may be of substantial importance for large modular systems where efficiency is important.
Watchword: Online Learning
It turns out that many different people use the term “Online Learning”, and often they don’t have the same definition in mind. Here’s a list of the possibilities I know of.
- Online Information Setting Online learning refers to a problem in which unlabeled data comes, a prediction is made, and then feedback is acquired.
- Online Adversarial Setting Online learning refers to algorithms in the Online Information Setting which satisfy guarantees of the form: “For all possible sequences of observations, the algorithim has regret at most log ( number of strategies) with respect to the best strategy in a set.” This is sometimes called online learning with experts.
- Online Optimization Constraint Online learning refers to optimizing a predictor via a learning algorithm tunes parameters on a per-example basis. This may or may not be applied in the Online Information Setting, and the strategy may or may not satisfy Adversarial setting theory.
- Online Computational Constraint Online learning refers to an algorithmic constraint that the amount of computation per example is constant as the number of examples increases. Again, this doesn’t imply anything in particular about the Information setting in which it is applied.
- Lifelong Learning Online learning refers to learning in a setting where different tasks come at you over time, and you need to rapidly adapt to past mastered tasks.
Interesting Papers at ICML 2007
Here are a few of the papers I enjoyed at ICML.
- Steffen Bickel, Michael Brüeckner, Tobias Scheffer, Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. the test set, and then use this prediction to importance weight labeled samples in the training set. This paper uses a specific parametric model, but the approach is easily generalized.
- Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A2 algorithm for active learning in the agnostic case. Last year we figured out agnostic active learning was possible. This year, it’s quantified. Hopefull soon, it will be practical.
- Sylvian Gelly, David Silver Combining Online and Offline Knowledge in UCT. This paper is about techniques for improving MoGo with various sorts of learning. MoGo has a fair claim at being the world’s best Go algorithm.
There were also a large number of online learning papers this year, especially if you count papers which use online learning techniques for optimization on batch datasets (as I do). This is expected, because larger datasets are becoming more common, and online learning makes more sense the larger the dataset. Many of these papers are of interest if your goal is learning fast while others are about extending online learning into new domains.
(Feel free to add any other papers of interest in the comments.)
Machine Learning Jobs are Growing on Trees
The consensus of several discussions at ICML is that the number of jobs for people knowing machine learning well substantially exceeds supply. This is my experience as well. Demand comes from many places, but I’ve seen particularly strong demand from trading companies and internet startups.
Like all interest bursts, this one will probably pass because of economic recession or other distractions. Nevertheless, the general outlook for machine learning in business seems to be good. Machine learning is all about optimization when there is uncertainty and lots of data. The quantity of data available is growing quickly as computer-run processes and sensors become more common, and the quality of the data is dropping since there is little editorial control in it’s collection. Machine Learning is a difficult subject to master (*), so those who do should remain in demand over the long term.
(*) In fact, it would be reasonable to claim that no one has mastered it—there are just some people who know a bit more than others.