At an intuitive level, the question here is “Can reinforcement learning be solved with classification?”

**Problem** Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of “posed” here is slightly murky. I consider a problem “posed” if there is an algorithm for constructing labeled classification examples.

**Past Work**

- There exists a reduction of reinforcement learning to classification given a generative model. A generative model is an inherently stronger assumption than the direct experience model.
- Other work on learning reductions may be important.
- Several algorithms for solving reinforcement learning in the direct experience model exist. Most, such as E
^{3}, Factored-E^{3}, and metric-E^{3}and Rmax require that the observation be the state. Recent work extends this approach to POMDPs. - This problem is related to predictive state representations, because we are trying to solve reinforcement learning with prediction ability.

**Difficulty** It is not clear whether this problem is solvable or not. A proof that it is not solvable would be extremely interesting, and even partial success one way or another could be important.

**Impact** At the theoretical level, it would be very nice to know if the ability to generalize implies the ability to solve reinforcement learning because (in a general sense) all problems can be cast as reinforcement learning. Even if the solution is exponential in the horizon time it can only motivate relaxations of the core algorithm like RLgen.

See here for discussions of a solution.