Progress in Active Learning – Machine Learning (Theory)

Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning. This is an update on the progress I know of. As a refresher, active learning as meant here is:

There is a source of unlabeled data.
There is an oracle from which labels can be requested for unlabeled data produced by the source.
The goal is to perform well with minimal use of the oracle.

Here is what I’ve learned:

Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier).
Nina, Alina, and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here.
Nicolo, Claudio, and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here. This was published a year or two ago and I recently learned about it.

All of these results are relatively ‘rough’: they don’t necessarily make good algorithms as stated (although the last one has a few experiments). None of these results are directly comparable because the assumptions vary. Comparing the assumptions and the results leads to a number of remaining questions:

Do the sufficient and seminecessary conditions apply to the IID only case? The adversarial case?
Is there a generic algorithm for any hypothesis space that works in the fully adversarial setting?
What are special cases of these algorithms which are computationally tractable and useful?

The Foundations of Active Learning workshop at NIPS should be a good place to discuss these questions.

3 Replies to “Progress in Active Learning”

Is the following result interesting? Assume your samples comes from R^d, and you are allowed to generate new syntetic samples and ask the oracle what is their label, then any set of n [unlabeled] points in R^d can be labeled using O(log n) oracle queries. The constant depends badly on d, however…

I don’t believe this holds, as written. I suspect there is some constraint such as “the labels obey a perceptron”.

Given some contraint of this form, the result sounds cute, but maybe not useful. Can you think of a real-world scenario where you would use it?

Sure. Assume a linear classifier.

And no, I dont see any application, but heck, even in R^4, you might need to query all points labels to get a full correct labeling, so really, you can not hope for much better without further assumptions or allowing syntetic examples.

Comments are closed.