What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals.
Why?
- Good Judgement. In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not.
- Learning Essence. The form of some of these bounds helps you understand what the essence of learning is.
- Algorithm Design. Some of these bounds suggest, motivate, or even directly imply learning algorithms.
What We Know Now
There are several families of bounds, based on how information is used.
- Testing Bounds. These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound, progressive validation also here and here, train and test bounds, and cross-validation (but see the big open problem). These techniques are the best available for goal (1) above, but provide little information towards goals (2) or (3). Some of these techniques are computationally efficient while others are not.
- Unlabeled test set bounds Instead of using labeled data to construct a tight bound, it is sometimes possible to use unlabeled data.
- Training Bounds These are methods which use labeled data to for both training and testing. These bounds provide insight into goals (2) and (3), but are not very effective for goal (1) above. These bounds teach us about how many independent examples are required to guarantee learning success on different on different representations. Any bound of this sort implies a learning algorithm: “minimize the bound”. The set covering machine is an application of this approach to a (variant) sample compression bound. Most other potential applications aren’t “quite there” yet, although they are close. Here is a list of learning algorithms and bounds that “almost work” for these algorithms.
- SVM PAC-Bayes margin bound.
- Perceptron PAC-Bayes margin bound, Sample Compression Bound
- Neural Network PAC-Bayes bound.
- Decision Tree Occam’s razor bound.
- Decision Tree Pruning Rademacher Complexity Bounds
- Semisupervised Clustering PAC-MDL or transductive PAC-Bayes bound
- Nearest neighbor ??(Note that the cross validation approaches work well here.)
Limitations
The independence assumption is a significant limitation in the applicability of this approach since we often can not believe that independence is satisfied on natural learning problems. Some work has gone into weakening this assumption.