In practice, it is not always easy to calculate some of the observable variables in the PAC-Bayes bound. In particular, is not necessarily easy to calculate when is some continuous distribution. We can avoid the need for a direct evaluation by a Monte Carlo evaluation and a bound on the tail of the Monte Carlo evaluation. Let be the observed rate of failure of random hypotheses drawn according to and applied to a random training example. Once again, we have a familiar Binomial distribution. Direct calculation will give us:
PROOF. Observer that the Monte Carlo estimate is distributed like a Binomial distribution and apply the Binomial Tail bound. ▫
In order to calculate a bound on the expected true error rate, we can first bound the expected empirical error rate with confidence then bound the expected true error rate with confidence , using our bound on . Since the total probability of failure is only our bound will hold with probability .
It is sometimes desirable to derandomize the PAC-Bayes bound. There are several ways to do this. The next chapter will talk about replacing the randomization over with a thresholded average. Another technique is to simply pick a hypothesis according to . While this would probably be effective in practice, the theoretical guarantees that can be made for this technique are weak. Strong theoretical guarantees can be made for a similar technique.
Suppose we make draws form . Let the drawn hypotheses be . We can form a new distribution which is uniform over the draws. The true error rate of this distribution can be bound with high probability according to the following theorem.
PROOF. Observer that the distribution of is distributed like a Binomial around and apply the Binomial Tail bound. ▫
Note the this theorem and the last theorem are essentially the same theorem.
This theorem allows us to do an (incomplete) derandomization. Instead of drawing from in order to evaluate an input, we can draw from which requires a fixed finite number of bits. This may allow for more efficient algorithms, and some people may find it reassuring that every hypothesis in has a low empirical error. The same confidence splitting trick of the last section can be used in order to guarantee is bounded and is bounded given that is bounded.
It is worth mentioning that no assumption of independence applies to either this theorem or the last theorem since we explicitly control (and create) the independence ourselves. These theorems hold for totally verifiable preconditions.