Quantitative Prediction Bounds

The goal of gaining a fine understanding how machine learning can work via sample complexity bounds is a long standing subproject of the learning theory community. The principle twist underlying the majority of the work here is a dedication to making these bounds tight enough that they could concievably be used directly by a computer.

Tutorials

Video plus Updated Slides .pdf from MLSS 2005 Chicago
John Langford. Practical Prediction Theory for Classification (A tutorial presented at ICML 2003) tex, ps.gz, pdf JMLR 6(Mar):273--306, 2005.
slides
A tale of two bounds

Workshop

(Ab)Use of Bounds NIPS 2004
Bounds less than 0.5

Programs

The program bound calculates bounds on the future error rate of classifiers given various observations. Include PAC-MDL bound.

The package semibound has a suite of several different bound calculation programs.

Papers

Matti Kaariainen and John Langford A Comparison of Tight Generalization Bounds ICML 2005 .ps.gz .pdf .tex	Tightens several prediction bounds and compares them on a variety of datasets. All of these bounds are "equally valid" in the sense that they apply given only an assumption of IID data. This contains a much tightened version of the progressive validation bound from "Beating the Holdout" below.
Arindam Banerjee and John Langford An Objective Evaluation Criterion for Clustering KDD 2004 .ps.gz .pdf .tex	Proposes evaluating clustering algorithms based upon their empirical performance on the PAC-MDL bound. Some very tight bounds are calculated.
Peter Grunwald and John Langford Suboptimal Behavior of Bayes and MDL in Classification under Misspecification (MLJ version) COLT 2004 (shorter)colt version .ps.gz, .pdf, .tex presentation	Shows that a common technique for using Bayes law for classification is "asymptotically inconsistent".
Avrim Blum and John Langford PAC-MDL Bounds COLT 2003 .ps.gz, .pdf, .tex	Unifies PAC-Bayes, VC dimension, Sample Compession, and Occam's Razor bounds. Shows that any description language for transductive classifiers can be turned into a bound.
John Langford and John Shawe-Taylor PAC-Bayes and Margins. NIPS2002 .pdf, .ps.gz, .tex	The PAC-Bayes bound can explain margin bounds. A small bug is fixed in the tutorial above.
John Langford, Quantitatively Tight Sample Complexity Bounds Thesis .ps.gz, .pdf, .lyx, .tex, mathml (viewable with mozilla) Contains: John Langford and Avrim Blum Microchoice Bounds and Self Bounding learning algorithms. COLT99, pages 209-214 and Machine Learning Journal (2002) .ps.gz, .pdf, .tex John Langford, Matthias Seeger, and Nimrod Megiddo. An Improved Predictive Accuracy Bound for Averaging Classifiers ICML2001 .pdf, .ps.gz John Langford and Rich Caruana, (Not) Bounding the True Error NIPS2001 .pdf, .ps.gz, .tex John Langford and Matthias Seeger, Bounds for Averaging Classifiers Technical report, Carnegie Mellon 2001 .pdf, .ps.gz John Langford and David McAllester. Computable Shell Decomposition Bounds. COLT2000 .pdf, .ps.gz, .tex JMLR version .pdf, .ps.gz, .tex John Langford Combining Train Set and Test Set Bounds. ICML2002 .ps.gz, .pdf, .tex	Microchoice bounds are an algorithmically practical variant of the Occam's Razor Bound applicable to discrete decision trees. Shell bounds take advantage of the distribution of error rates on a set of classifiers to gain tightness. The PAC-Bayes bound is shown capable of practical use on neural networks. Arbitrary train set and test set bounds can be tightly combined resulting in improvements over either alone. Margin bounds can be tightened by using the PAC-Bayes bound in the argument.
Avrim Blum, Adam Kalai, and John Langford Beating the Holdout: Bounds for KFold and Progressive Cross-Validation. COLT99 pages 203-208, .pdf, .ps.gz, .tex presentation	K-fold cross-validation on m examples is at least as good as a holdout set of size m/K. Progressive validation on m examples is about as tight as a holdout set of size m, while allowing (expected) half of the examples to be used in training.
Avrim Blum, Carl Burch, and John Langford, 1998. On Learning Monotone Boolean Functions Proceedings of the 39th Annual Symposium on Foundations of Computer Science FOCS '98. .pdf, .ps.gz, .tex	If the classifier you want to learn is monotonic, it can still be very hard to learn. A very simple algorithm provides the small gains that monotonicity provides.

Unreviewed Papers

John Langford Lessons from Learning Theory for Benchmark Design Invited talk at SPIE 2004	This discusses what we can glean from learning theory about benchmark design. Many of the conclusions are well-known, although perhaps the exact quantitative tradeoffs are helpful.
John Langford. Staged Learning	Shows that Baxter's model of learning to learn can be extended into an arbitrary depth hierarchy.