This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration.

- Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate.
- Regret minimizing reductions. Here, the goal is to bound the
*regret*of the created classifier in terms of the*regret*of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be*0*. An analagous result for reget is that for a probabilistic error correcting output code, multiclass regret is at most 4 * (binary regret)^{0.5}.

The use of “regret” is more desirable than the use of error rates, because (for example) the ECOC error rate bound implies nothing when there is enough noise so that the binary classifiers always have error rate 0.25. However the square root dependence introduced when analyzing regret is *not* desirable. A basic question is: Can we have the best of both worlds? Can we find some algorithm doing multiclass classification with binary classifiers such that average regret *r* for the binary classifiers implies average regret bounded by *4r* for the multiclass classifier?

If the answer is “yes”, that reduction algorithm may be empirically superior to the one we use now.

If the answer is “no”, that is a sharp and unexpected distinction between error rate analysis and regret analysis.

[...] The loss associated with individual predictions is highly variable rather than something simple like “0/1-loss” or “squared error loss”. One approach to this is to combine regret minimizing learning reductions with online learning algorithms (drawback: the reduced predictions may not be of intuitive things). Another approach is simply trying to make very flexible master algorithms (drawback: flexibility often comes with a weakening in the theoretical guarantees). [...]