Active Learning Tutorial, ICML 2009

Sanjoy Dasgupta and John Langford

Slides (post presentation)

Active learning is defined by contrast to the passive model of supervised learning where all the labels for learning are obtained without reference to the learning algorithm, while in active learning the learner interactively chooses which data points to label. The hope of active learning is that interaction can substantially reduce the number of labels required, making solving problems via machine learning more practical. This hope is known to be valid in certain special cases, both empirically and theoretically.

Variants of active learning have been investigated over several decades and fields. The focus of this tutorial is on general techniques which are applicable to many problems. At a mathematical level, this corresponds to approaches with provable guarantees under weakest-possible assumptions since real problems are more likely to fit algorithms which work under weak assumptions.

We believe this tutorial should be of broad interest. People working on or using supervised learning are often confronted with the need for more labels, where active learning can help. Similarly, in reinforcement learning, generalizing while interacting in more complex ways is an active research topic. Please join us.

Outline

We'll start with some nice example scenarios where active learning is sorely needed (eg. label = biological experiment).
We'll discuss the most common heuristics and modes of reasoning ("most informative query"), and the problem they all face: lack of consistency. We'll discuss how lack of consistency manifests in practice.
Where does active learning NOT help? We'll discuss toy examples that can be rigorously analyzed, and practical examples on which various heuristics can be tried.
Now let's try to look at positive scenarios more rigorously. Two different takes on why active learning is helpful:
1. Active learning allows efficient search through the hypothesis space.
2. Using unsupervised techniques, we can first first cluster data, then just ask for one label per cluster.
Both are gross idealizations, and the goal is to develop good algorithms that help exploit these effects in practical scenarios.
Model (i): Binary search through hypothesis space: QBC, A², DHM, Disagreement Coefficient.
1. The algorithms and theory
2. What happens in practice. For QBC, quite a lot of experimentation has been done.
Model (ii): Clustering model: DH.
What are the biggest open problems?

Bibliography

Yotam Abramson and Yoav Freund, Active Learning for Visual Object Recognition, UCSD tech report.
Nina Balcan, Alina Beygelzimer, John Langford, Agnostic Active Learning. ICML 2006
Alina Beygelzimer, Sanjoy Dasgupta, and John Langford, Importance Weighted Active Learning, ICML 2009.
David Cohn, Les Atlas and Richard Ladner. Improving generalization with active learning, Machine Learning 15(2):201-221, 1994.
Sanjoy Dasgupta and Daniel Hsu. Hierarchical sampling for active learning. ICML 2008.
Sanjoy Dasgupta, Coarse sample complexity bounds for active learning. NIPS 2005.
Sanjoy Dasgupta, Daniel J. Hsu, and Claire Monteleoni. A general agnostic active learning algorithm. NIPS 2007.
Yoav Freund, H. Sebastian Seung, Eli Shamir, and Naftali Tishby, Selective Sampling Using the Query by Committee Algorithm, Machine Learning, 28, 133-168, 1997.
Hanneke, S. A Bound on the Label Complexity of Agnostic Active Learning. ICML 2007.
Matti KÃ¤Ã¤riÃ¤inen, Active Learning in the Non-realizable Case, ALT 2006.
Manfred K. Warmuth, Gunnar Ratsch, Michael Mathieson, Jun Liao, and Christian Lemmon, Active Learning in the Drug Discovery Process, NIPS 2001.
Xiaojin Zhu, John Lafferty and Zoubin Ghahramani, Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions, ICML 2003 workshop

Some Surveys

These are a couple surveys on the general topic of applied active learning. They don't cover the theory which is the primary topic of this tutorial, but they do provide a very comprehensive listing of active learning applications. Burr Settles Active Learning Survey and Fredrik Olsson Active Learning for NLP.

Information on the Presenters

Sanjoy Dasgupta has several papers in the field, including the first to show exponential label complexity savings for linear predictors, one which characterizes active learning for realizable-case classification, and others 2 which make general active learning algorithms more tractable. He is also a professor at UCSD where he has spent several years developing and teaching material, including a book on algorithms.

John Langford worked on a paper introducing the idea of active learning in an agnostic setting and has worked on several learning settings related to active learning such as reinforcement learning and bandit analysis. He is a Research Scientist at Yahoo! Research, where there are many applications of active learning.