The State of the Reduction – Machine Learning (Theory)

What? Reductions are machines which turn solvers for one problem into solvers for another problem.
Why? Reductions are useful for several reasons.

Laziness. Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work.
Crystallization. The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on.
Theoretical Organization. By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder.

What we know now.

Typesafe reductions. In the beginning, there was the observation that every complex object on a computer can be written as a sequence of bits. This observation leads to the notion that a classifier (which predicts a single bit) can be used to predict any complex object. Using this observation, we can make the following statements:

Any prediction problem which can be broken into examples can be solved with a classifier.
In particular, reinforcement learning can be decomposed into examples given a generative model (see Lagoudakis & Parr and Fern, Yoon, & Givan).

This observation also often doesn’t work well in practice, because the classifiers are sometimes wrong, so one of many classifiers are often wrong.

Error Transform Reductions. Worrying about errors leads to the notion of robust reductions (= ways of using simple predictors such as classifiers to make complex predictions). Error correcting output codes were proposed in analogy to coding theory. These were analyzed in terms of error rates on training sets and general losses on training sets. The robustness can be (more generally) analyzed with respect to arbitrary test distributions, and algorithms optimized with respect to this notion are often very simple and yield good performance. Solving created classification problems up to error rate e implies:

Solving importance weighed classifications up to error rate eN where N is the expected importance. Costing
Solving multiclass classification up to error rate 4e using ECOC. Error limiting reductions paper
Solving Cost sensitive classification up to loss 2eZ where Z is the sum of costs. Weighted All Pairs algorithm
Finding a policy within expected reward (T+1)e/2 of the optimal policy for T step reinforcement learning with a generative model. RLgen paper
The same statement holds much more efficiently when the distribution of states of a near optimal policy is also known. PSDP paper

A new problem arises: sometimes the subproblems created are inherently hard, for example when estimating class probability from a classifier. In this situation saying “good performance implies good performance” is vacuous.

Regret Transform Reductions To cope with this, we can analyze how good performance minus the best possible performance (called “regret”) is transformed under reduction. Solving created binary classification problems to regret r implies:

Solving importance weighted regret up to r N using the same algorithm as for errors. Costing
Solving class membership probability up to l₂ regret 2r. Probing paper
Solving multiclass classification to regret 4 r^0.5. SECOC paper
Predicting costs in cost sensitive classification up to l₂ regret 4r SECOC again
Solving cost sensitive classification up to regret 4(r Z)^0.5 where Z is the sum of the costs of each choice. SECOC again

There are several reduction-related problems currently being worked on which I’ll discuss in the future.