Here are some papers that I found surprisingly interesting.
- Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, Greedy Layer-wise Training of Deep Networks. Empirically investigates some of the design choices behind deep belief networks.
- Long Zhu, Yuanhao Chen, Alan Yuille Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing. An unsupervised method for detecting objects using simple feature filters that works remarkably well on the (supervised) caltech-101 dataset.
- Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira, Analysis of Representations for Domain Adaptation. This is the first analysis I’ve seen of learning with respect to samples drawn differently from the evaluation distribution which depends on reasonable measurable quantities.
All of these papers turn out to have a common theme—the power of unlabeled data to do generically useful things.