At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?”
The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied.
In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.
At the active learning tutorial, I stated a set of somewhat more precise research questions that I don’t yet have answer to, and which I believe are worth answering. Here is a bit of an expansion on those questions for those interested.
- Is active learning possible in a fully adversarial setting? By fully adversarial, I mean when an adversary controls all the algorithms observations. Some work by Claudio and Nicolo has moved in this direction, but there is not yet a solid answer.
- Is there an efficient and effective reduction of active learning to supervised learning? The bootstrap IWAL approach is efficient but not effective in some situations where other approaches can succeed. The algorithm here is a reduction to a special kind of supervised learning where you can specify both examples and constraints. For many supervised learning algorithms, adding constraints seems problematic.
- Can active learning succeed with alternate labeling oracles? The ones I see people trying to use in practice often differ because they can provide answers of varying specificity and cost, or because some oracles are good for some questions, but not good for others.
- At this point, there have been several successful applications of active learning, but that’s not the same thing as succeeding with more robust algorithms. Can we succeed empirically with more robust algorithms? And is the empirical cost of additional robustness worth the empirical peace-of-mind that your learning algorithm won’t go astray where other more aggressive approaches may do so?