When I was thinking about the best “10 year paper” for ICML, I also took a look at a few other conferences. Here is one from 10 years ago that interested me:

David McAllester PAC-Bayesian Model Averaging, COLT 1999. 2001 Journal Draft.

Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. This meant that only very gross guidance could be provided for learning algorithm design. The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. Since this paper came out, there have been a number of moderately successful attempts to drive algorithms directly by the PAC-Bayes bound. We’ve gone from thinking that a bound driven algorithm is completely useless to merely a bit more pessimistic and computationally intense than might be necessary.

The PAC-Bayes bound is related to the “bits-back” argument that Geoff Hinton and Drew van Camp made at COLT 6 years earlier.

What other machine learning or learning theory papers from 10 years ago have had a substantial impact?

This could be a false lead, but the paper http://www.svms.org/domain-knowledge/TaDu99.pdf by Tax and Duin in Pattern Recognition Letters has received a substantial amount of citations. I speculate that this paper foreshadows the development in one-class learning, anomaly detection and class imbalance problems that we saw from around the year 2000 and which was summarized in a 2004 sigkdd special issue editorial

Good job. I will read these two papers