Active Learning Tutorial, ICML 2009
Active learning is defined by contrast to the passive model of
supervised learning where all the labels for learning are obtained
without reference to the learning algorithm, while in active learning
the learner interactively chooses which data points to label. The
hope of active learning is that interaction can substantially reduce
the number of labels required, making solving problems via machine
learning more practical. This hope is known to be valid in certain
special cases, both empirically and theoretically.
Variants of active learning have been investigated over several decades
and fields. The focus of this tutorial is on general techniques which
are applicable to many problems. At a mathematical level, this
corresponds to approaches with provable guarantees under
weakest-possible assumptions since real problems are more likely to
fit algorithms which work under weak assumptions.
We believe this tutorial should be of broad interest. People working
on or using supervised learning are often confronted with the need for
more labels, where active learning can help. Similarly, in
reinforcement learning, generalizing while interacting in more complex
ways is an active research topic. Please join us.
Outline
- We'll start with some nice example scenarios where active
learning is sorely needed (eg. label = biological experiment).
- We'll discuss the most common heuristics and modes of reasoning
("most informative query"), and the problem they all face: lack of
consistency. We'll discuss how lack of consistency manifests in
practice.
- Where does active learning NOT help? We'll discuss toy examples
that can be rigorously analyzed, and practical examples on which
various heuristics can be tried.
- Now let's try to look at positive scenarios more rigorously. Two
different takes on why active learning is helpful:
- Active learning allows efficient search through the hypothesis space.
- Using unsupervised techniques, we can first first cluster data, then just ask for one
label per cluster.
Both are gross idealizations, and the goal is to
develop good algorithms that help exploit these effects in practical
scenarios.
- Model (i): Binary search through hypothesis space: QBC, A2, DHM, Disagreement Coefficient.
- The algorithms and theory
- What happens in practice. For QBC, quite a lot of experimentation has been done.
- Model (ii): Clustering model: DH.
- What are the biggest open problems?
Bibliography
- Yotam Abramson and Yoav Freund, Active Learning for Visual Object Recognition, UCSD tech report.
- Nina Balcan, Alina Beygelzimer, John Langford, Agnostic Active Learning. ICML 2006
- Alina Beygelzimer, Sanjoy Dasgupta, and John Langford, Importance Weighted Active Learning, ICML 2009.
- David Cohn, Les Atlas and Richard Ladner. Improving generalization with active learning, Machine Learning 15(2):201-221, 1994.
- Sanjoy Dasgupta and Daniel Hsu. Hierarchical sampling for active learning. ICML 2008.
- Sanjoy Dasgupta, Coarse sample complexity bounds for active learning. NIPS 2005.
- Sanjoy Dasgupta, Daniel J. Hsu, and Claire Monteleoni. A general agnostic active learning algorithm. NIPS 2007.
- Yoav Freund, H. Sebastian Seung, Eli Shamir, and Naftali Tishby, Selective Sampling Using the Query by Committee Algorithm, Machine Learning, 28, 133-168, 1997.
- Hanneke, S. A Bound on the Label Complexity of Agnostic Active Learning. ICML 2007.
- Matti Kääriäinen, Active Learning in the Non-realizable Case, ALT 2006.
- Manfred K. Warmuth, Gunnar Ratsch, Michael Mathieson, Jun Liao, and Christian Lemmon, Active Learning in the Drug Discovery Process, NIPS 2001.
- Xiaojin Zhu, John Lafferty and Zoubin Ghahramani, Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions, ICML 2003 workshop
Some Surveys
These are a couple surveys on the general topic
of applied active learning. They don't cover the theory which is the
primary topic of this tutorial, but they do provide a very comprehensive
listing of active learning applications. Burr Settles Active Learning Survey and Fredrik Olsson Active Learning for NLP.
Information on the Presenters
Sanjoy Dasgupta has several
papers in the field, including the first to show exponential
label complexity savings for linear predictors, one which characterizes
active learning for realizable-case classification, and others 2 which make
general active learning algorithms more tractable. He is also a
professor at UCSD where he has spent several years developing and
teaching material, including a book on algorithms.
John Langford worked on a paper introducing the idea of
active learning in an agnostic setting and has worked on several
learning settings related to active learning such as reinforcement
learning and bandit analysis. He is a Research Scientist at Yahoo!
Research, where there are many applications of active learning.