One of the remarkable things about machine learning is how diverse it is. The viewpoints of Bayesian learning, reinforcement learning, graphical models, supervised learning, unsupervised learning, genetic programming, etc… share little enough overlap that many people can and do make their careers within one without touching, or even necessarily understanding the others.
There are two fundamental reasons why this is possible.
- For many problems, many approaches work in the sense that they do something useful. This is true empirically, where for many problems we can observe that many different approaches yield better performance than any constant predictor. It’s also true in theory, where we know that for any set of predictors representable in a finite amount of RAM, minimizing training error over the set of predictors does something nontrivial when there are a sufficient number of examples.
- There is nothing like a unifying problem defining the field. In many other areas there are unifying problems for which finer distinctions in approaches matter.
The implications of this observation agrees with inspection of the field.
- Particular problems are often “solved” by the first technique applied to them. This is particularly exacerbated when some other field first uses machine learning, as people there may not be aware of other approaches. A popular example of this is Naive Bayes for spam. In other fields, the baseline method is a neural network, an SVM, a graphical model, etc…
- The analysis of new learning algorithms is often subject to assumptions designed for the learning algorithm. Examples include large margins for support vector machines, a ‘correct’ Bayesian prior, correct conditional independence assumptions for graphical models, etc… Given such assumptions, it’s unsurprising to learn that the algorithm is the right method, and justifying a new algorithm becomes an exercise in figuring out an assumption which seems natural sounding under which the algorithm performs well. This assumption set selection problem is the theoretician’s version of the data set selection problem.
A basic problem is: How do you make progress in a field with this (lack of) structure? And what does progress even mean? Some possibilities are:
- Pick an approach and push it. [Insert your favorite technique] everywhere.
- Find new real problems and apply ML. The fact that ML is easy means there is a real potential for doing great things this way.
- Find a hard problem and work on it. Although almost anyone can do something nontrivial on most problems, achieving best-possible performance on some problems is not at all easy.
- Make the problem harder. Create algorithms that work fast online, in real time, with very few or no labeled examples, but very many examples, very many features, and very many things to predict.
I am least fond of approach (1), although many people successfully follow approach (1) for their career. What’s frustrating about approach (1), is that there does not seem to be any single simple philosophy capable of solving all the problems we might recognize as machine learning problems. Consequently, people following approach (1) are at risk of being outpersuaded by someone sometime in the future.
Approach (2) is perhaps the easiest way to accomplish great things, and in some sense much advance comes from new applications.
Approach (3) seems solid, promoting a different kind of progress than approach (2).
Approach (4) seems particularly cool to me at the moment. It is not as specialized as (2) or (3), and it seems many constraints are complementary. For example, there is large scale learning = online learning.