Principles of Learning Problem Design

NIPS 2007 Workshop

December 7, 2007 | Whistler, BC

Synopsis

This workshop is about how to design learning problems. The dominant system for applying machine learning in practice involves a human labeling data. This approach is limited to situations where human experts exist, can be afforded, and are fast enough to solve the relevant problem. In many settings these constraints are not met, yet it appears that machine learning is still possible via cleverly reinterpreting or reusing existing data. The basic idea is to create supervised learning problems from data which is not conventionally labeled in such a way that successfully solving these ancillary problems is helpful in solving the original learning problem. Since the task is akin to the problem of mechanism design in economics and game theory, we call it learning problem design.

Several recent examples of learning problem design include converting otherwise-unsupervised problems into supervised problems; creating recursive prediction problems (predicting from predictions); reducing one learning task to another.

This area is new and not entirely defined. It is our goal to bring together anyone who is interested in the topic, define what we do and don't understand, and attempt to define the principles of learning problem design.


Examples

Here are a few examples of the kinds of problems where learning problem design becomes important.

  1. Ad Display.

    Major search engines make money from ads that are clicked on. It's tempting to think of the click as a label, yet there are several confounding issues.

    1. Whether or not an ad is clicked on depends greatly on how it is placed relative to other results.

    2. The goal is not really predicting whether a click will occur or not, but rather constructing a good ad display policy.

    3. Observing a click on one ad, does not tell you whether or not another ad would have been clicked on if it was instead displayed. This implies we necessarily have an exploration problem.

  2. Spam Filtering.

    Spam filtering is one of the most widely encountered applications of machine learning, yet spam filtering is not a conventional machine learning problem.

    1. Should an email remaining in the inbox be interpreted as label "not spam"?

    2. Should an email remaining in the junkbox be interpreted as label "spam"? What if the user never looks at the junkbox?

    3. Should the labels just be the human actions of shifting mail from one box to another?

    4. If you are trying to generalize across multiple people, the meaning of spam changes from one person to another. You might think of this as a multitask learning problem except that some of these people could be totally adversarial spammers.

  3. Robotic Sensor Understanding. A robot might have multiple sensors such as odometry, bump, IR, laser, camera and have a goal of navigating from its current location to some other. Simply treating this as a general RL problem is pretty hopeless, because the amount of feedback required for learning might be much larger than the life of the robot. In order to improve the rate of learning, it's tempting to solve ancillary learning problems. The ancillary learning problems might be phrased as hypotheticals: "Does odometry forward 20 feet imply bump given current sensor values?" A problem design problem arises again here: what do you try to predict and how do you use these ancillary problems to solve the real problem?


There are several learning settings which, although relevant, do not adequatly describe the problem we are interested in:

(1) These problems are not truly unsupervised, because there is both an identifiable goal and the data has a strongly relevant signal.

(2) The problems are often fundamentally easier than reinforcement learning because the implications of changing world state can often be neglected. (Although perhaps not for (3) above.)

(3) This is also not a bandit problem because we are interested in systems that generalize across available context information. In the language of bandit learning, competing with just the set of arms is too weak---we want to compete with a large set of policies choosing arms.

(4) This is not a multitask or transfer learning task, because we are not trying to solve or reuse the solution of multiple (human-labeled) supervised tasks. Instead, we are interested in creating new "synthetic" prediction problems from data for use as either subroutines or as final output.

Naturally techniques from all of these related settings may be helpful.