What? Reductions are machines which turn solvers for one problem into solvers for another problem.
Why? Reductions are useful for several reasons.
- Laziness. Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work.
- Crystallization. The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on.
- Theoretical Organization. By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder.
What we know now.
Typesafe reductions. In the beginning, there was the observation that every complex object on a computer can be written as a sequence of bits. This observation leads to the notion that a classifier (which predicts a single bit) can be used to predict any complex object. Using this observation, we can make the following statements:
- Any prediction problem which can be broken into examples can be solved with a classifier.
- In particular, reinforcement learning can be decomposed into examples given a generative model (see Lagoudakis & Parr and Fern, Yoon, & Givan).
This observation also often doesn’t work well in practice, because the classifiers are sometimes wrong, so one of many classifiers are often wrong.
Error Transform Reductions. Worrying about errors leads to the notion of robust reductions (= ways of using simple predictors such as classifiers to make complex predictions). Error correcting output codes were proposed in analogy to coding theory. These were analyzed in terms of error rates on training sets and general losses on training sets. The robustness can be (more generally) analyzed with respect to arbitrary test distributions, and algorithms optimized with respect to this notion are often very simple and yield good performance. Solving created classification problems up to error rate e implies:
- Solving importance weighed classifications up to error rate eN where N is the expected importance. Costing
- Solving multiclass classification up to error rate 4e using ECOC. Error limiting reductions paper
- Solving Cost sensitive classification up to loss 2eZ where Z is the sum of costs. Weighted All Pairs algorithm
- Finding a policy within expected reward (T+1)e/2 of the optimal policy for T step reinforcement learning with a generative model. RLgen paper
- The same statement holds much more efficiently when the distribution of states of a near optimal policy is also known. PSDP paper
A new problem arises: sometimes the subproblems created are inherently hard, for example when estimating class probability from a classifier. In this situation saying “good performance implies good performance” is vacuous.
Regret Transform Reductions To cope with this, we can analyze how good performance minus the best possible performance (called “regret”) is transformed under reduction. Solving created binary classification problems to regret r implies:
- Solving importance weighted regret up to r N using the same algorithm as for errors. Costing
- Solving class membership probability up to l2 regret 2r. Probing paper
- Solving multiclass classification to regret 4 r0.5. SECOC paper
- Predicting costs in cost sensitive classification up to l2 regret 4r SECOC again
- Solving cost sensitive classification up to regret 4(r Z)0.5 where Z is the sum of the costs of each choice. SECOC again
There are several reduction-related problems currently being worked on which I’ll discuss in the future.