Background One branch of learning theory focuses on theorems which
- Assume samples are drawn IID from an unknown distribution D.
- Fix a set of classifiers
- Find a high probability bound on the maximum true error rate (with respect to D) as a function of the empirical error rate on the training set.
Many of these bounds become extremely complex and hairy.
Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.)
One of the dangers of striving for “functional tightness” is that the bound can depend on strangely interrelated parameters. In fact, apparently these strange interrelations can become so complex that they end up always larger than 1 (some bounds here and here).
It seems we should ask the question: “Why are we doing the math?” If it is just done to get a paper accepted under review, perhaps this is unsatisfying. The real value of math comes when it guides us in designing learning algorithms. Math from bounds greater than 1 is a dangerously weak motivation for learning algorithm design.
There is a significant danger in taking this “oops” too strongly.
- There exist some reasonable arguments (not made here) for aiming at functional tightness.
- The value of the research a person does is more related to the best they have done than the worst.