Workshop Summary—Principles of Learning Problem Design

This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year.

The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. However, when other methods are used this becomes more problematic. This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human.

The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. For me, the posters at the end of the workshop were quite helpful in getting approaches to gel.

Here are some answers to “where do the labels come from?”:

Simulation Use a simulator (which need not be that good) to predict the cost of various choices and turn that into label information. Ashutosh had some cool demos showing the power of this approach. Gregory also presented a poster which might be viewed this way.
Agreement A label is a point of agreement. Luis often used an agreement mechanism to induce labels with games. Sham discussed the power of agreement to constrain learning algorithms. Huzefa‘s work on bioprediction can be thought of as partly using agreement with previous structures to simulate the label of a new structure.
Compilation Labels can be found by compiling one learning problem into another. Mark and I both talked about reductions a bit, which come with some nice formal guarantees.
Backprop Labels are the signals in generalized backpropagation (David Bradley‘s talk).

Some answers to “where do the data come from” are:

Everywhere The essential idea is to integrate as many data sources as possible. Rakesh had several algorithms which (in combination) allowed him to use a large number of diverse data sources in a text domain.
Sparsity A representation is formed by finding a sparse set of basis functions on otherwise totally unlabeled data. Rajat discussed self-taught learning algorithms which achieve this.
Self-prediction A representation is formed by learning to self-predict a set of raw features. Hal‘s talk covered this idea.

A workshop like this is successful if it informs the questions we ask (and answer) in the future. Some natural questions (some of which were discussed) are:

What is a natural, sufficient langauge for adding prior information into a learning system? Which languages are insufficient? Shai described a sense in which kernels are insufficient as a language for prior information. Bayesian analysis emphasizes reasoning about the parameters of the model, but the language of examples or maybe label expectations may be more natural.
What is missing from the above lists? And are the elements of the lists actually distinct?
How do we modularize? Many of the approaches use problem-specific tricks. That’s to be expected for a direction of research which is just starting, but it’s important to modularize these techniques so they can be repeatedly and easily applied. Achieving modularity in a manner which supports prior information properly seems tricky.
How do we formalize and analyze? Of the items listed above, I feel like we only have some reasonable understanding of the compilation approach. The other approaches and questions are essentially unexplored territory where some serious thinking may be helpful.