We will first discuss standard covering number bounds. Define a “distance” in terms of how often hypotheses disagree according to: Now, start with an epsilon net defined by: An epsilon net is the minimum size of a set which contains an element “near” to every element in .
Then a covering number is defined as: The covering number is the worst epsilon net.
PROOF. In [20]. ▫
How tight is this bound when applied to a finite independent hypothesis space? We can improve the constants by using an argument with fewer triangle inequalities in the discrete case and get the following results: Comparing this with a very loose application of the discrete hypothesis bound 4.2.3 we see that the penalty term in the covering number bound is worse by factor of . Put another way, dividing the number of samples by or increasing the hypothesis space size to and then applying a sloppy discrete hypothesis bound is about equivalent to applying a very specialized covering number bound. We seek a covering number bound which does not divide the effective value of a hypothesis by .