Covering number bounds are used to bound the true error rate of classifiers chosen from an infinite hypothesis space using examples [20]. A cover is a finite set of hypotheses which satisfies the following property: every hypothesis in the infinite space is “near” some element in the finite cover. When a Lipschitz condition holds on the hypothesis space, it is generally possible to construct these covers and the existence of a cover is required for learnability [2]. Alternatively, Sauer’s lemma (see [43] or [51]) bounds the size of the cover in terms of the VC dimension which is defined combinatorially: the VC dimension is the largest number of examples which the hypothesis space can classify in an arbitrary manner.
The principal disadvantage of covering number results is that they are notoriously loose, to the point that they are often useless when applied in practice (see “criticisms” in [20]). Here, “useless” means that the bound on the true error rate is “always wrong”. The amount of “looseness” can be quantified by comparison with other bounds in the regimes where other bounds hold. On a finite hypothesis space we have near-perfect agreement between the upper bound 4.2.1 and the lower upper bound 4.4.2 for independent hypotheses. In fact, as the number of examples goes to infinity, the agreement is perfect, regardless of the size of the hypothesis space. When we apply covering number bounds to this problem such properties do not arise. Since part of the argument involves splitting the examples into sets, the difference between a covering number based upper bound and the lower upper bound can be large even when the number of examples goes to infinity. In practice, the covering number bound at least squares the discrete hypothesis size. Obviously, further loosening of covering number bounds with Sauer’s lemma result in even worse bounds.
Can we construct a calculable true error upper bound for continuous hypothesis spaces which is at least asymptotically tight? In a sense, this has already been done with PAC-Bayes bounds in chapter 6, but there are drawbacks in applicability to that approach since PAC-Bayes bounds do not apply in a meaningful way to a single hypothesis drawn from an infinite hypothesis space. A covering number argument would hopefully apply in a meaningful way to a singly hypothesis. A covering number bound which is asymptotically tight on some learning problems does exist and is covered next.