Problem: Reductions and Relative Ranking Metrics

This, again, is something of a research direction rather than a single problem.

There are several metrics people care about which depend upon the relative ranking of examples and there are sometimes good reasons to care about such metrics. Examples include AROC, “F1”, the proportion of the time that the top ranked element is in some class, the proportion of the top 10 examples in some class (google‘s problem), the lowest ranked example of some class, and the “sort distance” from a predicted ranking to a correct ranking. See here for an example of some of these.

Problem What does the ability to classify well imply about performance under these metrics?

Past Work

  1. Probabilistic classification under squared error can be solved with a classifier. A counterexample shows this does not imply a good AROC.
  2. Sample complexity bounds for AROC (and here).
  3. A paper on “Learning to Order Things“.

Difficulty Several of these may be easy. Some of them may be hard.

Impact Positive or negative results will broaden our understanding of the relationship between different learning goals. It might also yield new algorithms (via the reduction) for solving these problems.

Why Papers?

Makc asked a good question in comments—“Why bother to make a paper, at all?” There are several reasons for writing papers which may not be immediately obvious to people not in academia.

The basic idea is that papers have considerably more utility than the obvious “present an idea”.

  1. Papers are a formalized units of work. Academics (especially young ones) are often judged on the number of papers they produce.
  2. Papers have a formalized method of citing and crediting other—the bibliography. Academics (especially older ones) are often judged on the number of citations they receive.
  3. Papers enable a “more fair” anonymous review. Conferences receive many papers, from which a subset are selected. Discussion forums are inherently not anonymous for anyone who wants to build a reputation for good work.
  4. Papers are an excuse to meet your friends. Papers are the content of conferences, but much of what you do is talk to friends about interesting problems while there. Sometimes you even solve them.
  5. Papers are an excuse to get a large number of smart people in the same room and think about the same topic.
  6. Good papers are easy to read. In particular, they are much easier to read (and understand) then a long discussion thread. They are even easy to read in several decades. (Writing good papers is hard)

All of the above are reasons why writing papers is a good idea. It’s also important to understand that academia is a large system and large systems have a lot of inertia. This means switching from paper writing to some other method of doing research won’t happen unless the other method is significantly more effective, and even then there will be a lot of inertia.

Also note: the “similar sites” link to the right is to other discussion forums, etc…

Problem: Online Learning

Despite my best intentions, this is not a fully specified problem, but rather a research direction.

Competitive online learning is one of the more compelling pieces of learning theory because typical statements of the form “this algorithm will perform almost as well as a large set of other algorithms” rely only on fully-observable quantities, and are therefore applicable in many situations. Examples include Winnow, Weighted Majority, and Binomial Weighting. Algorithms with this property haven’t taken over the world yet. Here might be some reasons:

  1. Lack of caring. Many people working on learning theory don’t care about particular applications much. This means constants in the algorithm are not optimized, usable code is often not produced, and empirical studies aren’t done.
  2. Inefficiency. Viewed from the perspective of other learning algorithms, online learning is terribly inefficient. It requires that every hypothesis (called an expert in the online learning setting) be enumerated and tested on every example. (This is similar to the inefficiency of using Bayes law as an algorithm directly, and there are strong similarities in the algorithms.)

For an example of (1), there is a mysterious factor of 2 in the Binomial Weighting algorithm which has not been fully resolved. Some succesful applications also exist such as those based on SNoW.

The way to combat problem (2) is to introduce more structure into the hypothesis/experts. Some efforts have already been made in this direction. For example, it’s generally feasible to introduce an arbitrary bias or “prior” over the experts in the form of some probability distribution, and perform well with respect to that bias. Another nice piece of work by Adam Kalai and Santosh Vempala discusses how to efficiently handle several forms of structured experts. At an intuitive level, further development discovering how to efficiently work with new forms of structure might payoff well.

The basic research direction here is to address the gap between theory and practice, possibly by solving the above problems and possibly by discovering and addressing other problems.

Solution: Reinforcement Learning with Classification

I realized that the tools needed to solve the problem just posted were just created. I tried to sketch out the solution here (also in .lyx and .tex). It is still quite sketchy (and probably only the few people who understand reductions well can follow).

One of the reasons why I started this weblog was to experiment with “research in the open”, and this is an opportunity to do so. Over the next few days, I’ll be filling in details and trying to get things to make sense. If you have additions or ideas, please propose them.

Problem: Reinforcement Learning with Classification

At an intuitive level, the question here is “Can reinforcement learning be solved with classification?”

Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of “posed” here is slightly murky. I consider a problem “posed” if there is an algorithm for constructing labeled classification examples.

Past Work

  1. There exists a reduction of reinforcement learning to classification given a generative model. A generative model is an inherently stronger assumption than the direct experience model.
  2. Other work on learning reductions may be important.
  3. Several algorithms for solving reinforcement learning in the direct experience model exist. Most, such as E3, Factored-E3, and metric-E3 and Rmax require that the observation be the state. Recent work extends this approach to POMDPs.
  4. This problem is related to predictive state representations, because we are trying to solve reinforcement learning with prediction ability.

Difficulty It is not clear whether this problem is solvable or not. A proof that it is not solvable would be extremely interesting, and even partial success one way or another could be important.

Impact At the theoretical level, it would be very nice to know if the ability to generalize implies the ability to solve reinforcement learning because (in a general sense) all problems can be cast as reinforcement learning. Even if the solution is exponential in the horizon time it can only motivate relaxations of the core algorithm like RLgen.