# Machine Learning (Theory)

## 10/8/2010

### An easy proof of the Chernoff-Hoeffding bound

Tags: Machine Learning Sanjoy@ 9:45 am

Textbooks invariably seem to carry the proof that uses Markov’s inequality, moment-generating functions, and Taylor approximations. Here’s an easier way.

For $p,q \in (0,1)$, let $K(p,q)$ be the KL divergence between a coin of bias $p$ and one of bias $q$: $K(p,q) = p \ln \frac{p}{q} + (1-p) \ln \frac{1-p}{1-q}.$

Theorem: Suppose you do $n$ independent tosses of a coin of bias $p$. The probability of seeing $qn$ heads or more, for $q > p$, is at most $\exp(-nK(q,p))$. So is the probability of seeing $qn$ heads or less, for $q < p$.

Remark: By Pinsker’s inequality, $K(q,p) \geq 2(p-q)^2$.

Proof Let’s do the $q > p$ case; the other is identical.

Let $\theta_p$ be the distribution over $\{0,1\}^n$ induced by a coin of bias $p$, and likewise $\theta_q$ for a coin of bias $q$. Let $S$ be the set of all sequences of $n$ tosses which contain $qn$ heads or more. We’d like to show that $S$ is unlikely under $\theta_p$.

Pick any $\bar{x} \in S$, with say $k \geq qn$ heads. Then:
$\frac{\theta_q(\bar{x})}{\theta_p(\bar{x})} = \frac{q^k(1-q)^{n-k}}{p^k(1-p)^{n-k}} \geq \frac{q^{qn}(1-q)^{n-qn}}{p^{qn}(1-p)^{n-qn}} = \left( \frac{q}{p} \right)^{qn} \left( \frac{1-q}{1-p}\right)^{(1-q)n} = e^{n K(q,p)}.$

Since $\theta_p(\bar{x}) \leq \exp(-nK(q,p)) \theta_q(\bar{x})$ for every $\bar{x} \in S$, we have $\theta_p(S) \leq \exp(-nK(q,p)) \theta_q(S) \leq \exp(-nK(q,p))$ and we’re done.

###### 15 Comments to “An easy proof of the Chernoff-Hoeffding bound”
1. How do you justify the step where you replace k with qn?

• DH says:

Since q/p > 1 and k > qn, (q/p)^k >= (q/p)^(qn). The other factor is similar.

• The other factor is similar, but doesn’t it give the wrong direction in inequality? I get 1-q ((1-q)/(1-p))^k<=((1-q)/(1-p))^qn

• broken formatting, should be 1-q less than 1-p implies ((1-q)/(1-p))^k<=((1-q)/(1-p))^qn

• dh's protege says:

you dropped the minus sign on k.

2. Umar Syed says:

John, this is really nice!

• jl says:

Sanjoy, actually

3. Hsuan-Tien Lin says:

Very cool! Thank you, Sanjoy. My students would be so happy to know that they don’t need to go through the lengthy proof that was given last week.

4. Isn’t this what they call the method of types in info theory books?

5. Is there a way to get Chernoff-type bound for more than 2 outcomes? Cover/Thomas use Sanov’s Theorem for that, but it seems much looser than Chernoff-type bound

6. Suresh says:

This is similiar to the idea as that used by Impagliazzo and Kabanets in a recent paper on a combinatorial version of the CH-bound. In that paper, they also generalize this argument (as Yaroslav asks). Nice !

7. Dai says:

Excellent exposition! Would you please tell me where you found this proof since I assume that it predated the proof by Impapliazzo-Kabanets. Many thanks in advance!

8. jl says:

I’m not sure which paper you are referring to.

My understanding is that this proof is new in origination, although I can’t swear it’s the first time it’s been discovered.

• Dai says:

@jl: That’s excellent! I have no doubt about its originality, but I thought it was officially published somewhere else either by you or others. Such a nice and simple proof should be more well-known!

9. […] the approach to types, summarized with this paper of Impagliazzo and Kabanets, in addition to this brief publish by Sanjoy Dasgupta. These latter proofs tend to be more “intuitive” for the reason that they supplies a […]

Sorry, the comment form is closed at this time.