In Active Learning, the question changes – Machine Learning (Theory)

A little over 4 years ago, Sanjoy made a post saying roughly “we should study active learning theoretically, because not much is understood”.

At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?”

The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied.

In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.

At the active learning tutorial, I stated a set of somewhat more precise research questions that I don’t yet have answer to, and which I believe are worth answering. Here is a bit of an expansion on those questions for those interested.

Is active learning possible in a fully adversarial setting? By fully adversarial, I mean when an adversary controls all the algorithms observations. Some work by Claudio and Nicolo has moved in this direction, but there is not yet a solid answer.
Is there an efficient and effective reduction of active learning to supervised learning? The bootstrap IWAL approach is efficient but not effective in some situations where other approaches can succeed. The algorithm here is a reduction to a special kind of supervised learning where you can specify both examples and constraints. For many supervised learning algorithms, adding constraints seems problematic.
Can active learning succeed with alternate labeling oracles? The ones I see people trying to use in practice often differ because they can provide answers of varying specificity and cost, or because some oracles are good for some questions, but not good for others.
At this point, there have been several successful applications of active learning, but that’s not the same thing as succeeding with more robust algorithms. Can we succeed empirically with more robust algorithms? And is the empirical cost of additional robustness worth the empirical peace-of-mind that your learning algorithm won’t go astray where other more aggressive approaches may do so?

5 Replies to “In Active Learning, the question changes”

It was fun to watch the recent conquest of the agnostic setting in terms of the magnitude of sample complexity improvements and the general conditions in which they were realized. We still lack understanding of the precise conditions amenable to active learning. The (generalized) disagreement coefficient and splitting index are great, important contributions but work in this area is just beginning.

The existing cost model – uniform, stationary, coming out of a fixed budget – was OK when active learning was being scrutinized relative to passive learning, but now requires more sophistication.

Practically, there is an awful lot of inertia behind the manipulation of fixed data sets and the assumption of iid data. The transition requires both theoretical and engineering preparation.

I hope to see some more cross-pollination with related fields as well. I think some of the most recent improvements could be folded back into experiment design and change-point estimation, and perhaps there could be a nice “passive vs. active” interchange with the compressed sensing folks.

Pingback: Dasgupta and Hsu (2008) Hierarchical Sampling for Active Learning « LingPipe Blog

Hi John,

With respect to point 3, I just wanted to point to some recent papers that attempt to elicit alternative forms of supervision/ model constraints using active learning schemes:

Learning from Measurements in Exponential Families
Percy Liang, Michael I. Jordan, and Dan Klein, ICML 2009
http://www.cs.mcgill.ca/~icml2009/papers/393.pdf

Active Learning by Labeling Features
Gregory Druck, Burr Settles, Andrew McCallum.
To appear in Proceedings of EMNLP.
http://www.cs.umass.edu/~gdruck/pubs/druck09active.pdf

Uncertainty Sampling and Transductive Experimental Design for Active Dual Supervision
V. Sindhwani, P. Melville, R. Lawrence, ICML 2009
http://people.cs.uchicago.edu/~vikass/GRADS.pdf

Vikas

Pingback: Eleksius is an active learning problem « Memming

Pingback: The End of the Beginning of Active Learning | ??

Comments are closed.