How should we, as researchers in machine learning, organize ourselves?
The most immediate measurable objective of computer science research is publishing a paper. The most difficult aspect of publishing a paper is having reviewers accept and recommend it for publication. The simplest mechanism for doing this is to show theoretical progress on some standard, well-known easily understood problem.
In doing this, we often fall into a local minima of the research process. The basic problem in machine learning is that it is very unclear that the mathematical model is the right one for the (or some) real problem. A good mathematical model in machine learning should have one fundamental trait: it should aid the design of effective learning algorithms. To date, our ability to solve interesting learning problems (speech recognition, machine translation, object recognition, etc…) remains limited (although improving), so the “rightness” of our models is in doubt.
If our mathematical models are bad, the simple mechanism of research above can not yield the end goal. (This should be agreed on even by people who disagree about what the end goal of machine learning is!) Instead, research which proposes and investigates new mathematical models for machine learning might yield the end goal. Doing this is hard.
- Coming up with a new mathematical model is just plain not easy. Some sources of inspiration include:
- Watching carefully: what happens succesfully in practice, can often be abstracted into a mathematical model.
- Swapping fields: In other fields (for example crypto), other methods of analysis have been developed. Sometimes, these methods can be transferred.
- Model repair: Existing mathematical models often have easily comprehendable failure modes. By thinking about how to avoid such failure modes, we can sometimes produce a new mathematical model.
- Speaking about a new model is hard. The difficulty starts with you in explaining it. Often, when trying to converge on a new model, we think of it in terms of the difference with respect to an older model, leading to a tangled explanation. The difficulty continues with other people (in particular: reviewers) reading it. For a reviewer with limited time, it is very tempting to assume that any particular paper is operating in some familiar model and fail out. The best approach here seems to be super explicitness. You can’t be too blunt about saying “this isn’t the model you are thinking about”.
- Succeeding with new models is also hard. When people don’t have a reference frame to understand the new model, they are unlikely to follow up, as is necessary for success in academia.
The good news here is that a succesful new model can be a big win. I wish it was an easier win: the barriers to success are formidably high, and it seems we should do everything possible to lower the barriers to success for the sake of improving research.
I am wondering, however, is whether the new models of the past decade or two really helped solve any new problems. The real advances in the field arose with the availability of new types of data (text, web, large databases, networks), and with increased computational power. While the actual work in modelling does make progress towards better organization and understanding, it’s really our ability to convert world into the data that makes most of the difference.
Insightful post.
To somehow paraphrase I’d say that research may be stuck in a local minima because the conceptual building blocks (algorithms, ideas, assumptions, etc.) that are used, selected but never questionned are wrong (this assumes that there is a “truth”). If true, it means that you can assemble them in any configuration you want, you will forever stay in the poor suburbs of the solution space.
For instance, I think that the widely spread, even unconscious scheme of designing FEATURES and running SUPERVISED learning on them is not the right one. To me the idea of feature design, whether in computer vision or nlp is fundamentally wrong. I, however, have a hard time trying to show that in papers because, yes, when you start from scratch, without features, you won’t immediately get results that are as “good” as those from the beaten paths. It is indeed easier to say: this paper presents a new kernel and extends the work of blahblah. When you try to do things really differently, people have the feeling that you are going backwards whereas, actually, you are just going somewhere else.
A reasonable analogy, perhaps, is compiler optimizations for speeding up programs. The primary mechanism speeding up the execution of program has been faster processors [akin to more access to data], but compiler optimization has made a steady difference on the baeline due to faster processors [akin to how better models of learning might aid us in designing better algorithms].
An interesting question is, at some point in the future will better models be necessary for real progress. I believe the answer is “yes”.
That’s an excellent analogy!
I also agree about the necessity of better models. But what is the criterion? Predictive accuracy hasn’t been yielding breakthroughs recently. Universality is almost what started machine learning (in contrast to custom task-specific models), but also data compression. Perhaps it’s time to optimize some other criterion. Embeddability is my favorite alternative – an idea arising from an observation that if machine learning was really easy to include into a piece of software, people would make use of it much more often. Relating to your analogy, this would relate to focusing on programming languages rather than on compiler optimizations.
Hello! I´d like to ask you permission to translate some of your articles and publish them in my homepage.
Best wishes!
Orestes
That sounds fine, as long as you refer to hunch.net as a source.