Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning. This is an update on the progress I know of. As a refresher, active learning as meant here is:
- There is a source of unlabeled data.
- There is an oracle from which labels can be requested for unlabeled data produced by the source.
- The goal is to perform well with minimal use of the oracle.
Here is what I’ve learned:
- Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier).
- Nina, Alina, and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here.
- Nicolo, Claudio, and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here. This was published a year or two ago and I recently learned about it.
All of these results are relatively ‘rough’: they don’t necessarily make good algorithms as stated (although the last one has a few experiments). None of these results are directly comparable because the assumptions vary. Comparing the assumptions and the results leads to a number of remaining questions:
- Do the sufficient and seminecessary conditions apply to the IID only case? The adversarial case?
- Is there a generic algorithm for any hypothesis space that works in the fully adversarial setting?
- What are special cases of these algorithms which are computationally tractable and useful?
The Foundations of Active Learning workshop at NIPS should be a good place to discuss these questions.
Is the following result interesting? Assume your samples comes from R^d, and you are allowed to generate new syntetic samples and ask the oracle what is their label, then any set of n [unlabeled] points in R^d can be labeled using O(log n) oracle queries. The constant depends badly on d, however…
I don’t believe this holds, as written. I suspect there is some constraint such as “the labels obey a perceptron”.
Given some contraint of this form, the result sounds cute, but maybe not useful. Can you think of a real-world scenario where you would use it?
Sure. Assume a linear classifier.
And no, I dont see any application, but heck, even in R^4, you might need to query all points labels to get a full correct labeling, so really, you can not hope for much better without further assumptions or allowing syntetic examples.