Before introducing the results, we will mention a few important details about this bound.
A naive application of the Shell bound would not prove useful because the size of the hypothesis space can be extremely large. Instead, we must combine it with Structural Risk Minimization (SRM) to achieve useful results. In SRM, you start with a bound for each hypothesis space, , in a sequence of nested hypothesis spaces, . These bounds on individual hypothesis spaces are combined to create a bound which applies for all hypothesis spaces. The particular nesting we use is “ all decision trees with or fewer internal nodes”.
Since the size of the decision tree hypothesis spaces increases exponentially with the index , we choose . This choice has the property that is always small in comparison to , implying that the SRM bound is never much worse than a simple application of the underlying bound.
The computational cost of calculating some of these bounds is nontrivial. There are two basic parts to this computation;
We typically avoid the difficulties inherent in (1) using tricks such as monte carlo sampling followed by bounding the deviation of the monte carlo sample. In particular, we use the anytime computation trick of section 12.1.3 here.
To avoid difficulties inherent in problem (2), we use fast bounds (see section 3.2) on the Binomial tail as necessary.