NYAS ML Symposium this year.

The NYAS ML symposium grew again this year to 170 participants, despite the need to outsmart or otherwise tunnel through a crowd.

Perhaps the most distinct talk was by Bob Bell on various aspects of the Netflix prize competition. I also enjoyed several student posters including Matt Hoffman‘s cool examples of blind source separation for music.

I’m somewhat surprised how much the workshop has grown, as it is now comparable in size to a small conference, although in style more similar to a workshop. At some point as an event grows, it becomes owned by the community rather than the organizers, so if anyone has suggestions on improving it, speak up and be heard.

NIPS workshops

Many of the NIPS workshops have a deadline about now, and the NIPS early registration deadline is Nov. 6. Several interest me:

  1. Adaptive Sensing, Active Learning, and Experimental Design due 10/27.
  2. Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra, due Nov. 6.
  3. Large-Scale Machine Learning: Parallelism and Massive Datasets, due 10/23 (i.e. past)
  4. Analysis and Design of Algorithms for Interactive Machine Learning, due 10/30.

And I’m sure many of the others interest others. Workshops are great as a mechanism for research, so take a look if there is any chance you might be interested.

ALT 2009

I attended ALT (“Algorithmic Learning Theory”) for the first time this year. My impression is ALT = 0.5 COLT, by attendance and also by some more intangible “what do I get from it?” measure. There are many differences which can’t quite be described this way though. The program for ALT seems to be substantially more diverse than COLT, which is both a weakness and a strength.

One paper that might interest people generally is:

Alexey Chernov and Vladimir Vovk, Prediction with Expert Evaluators’ Advice. The basic observation here is that in the online learning with experts setting you can simultaneously compete with several compatible loss functions simultaneously. Restated, debating between competing with log loss and squared loss is a waste of breath, because it’s almost free to compete with them both simultaneously. This might interest anyone who has run into “which loss function?” debates that come up periodically.

Static vs. Dynamic multiclass prediction

I have had interesting discussions about distinction between static vs. dynamic classes with Kishore and Hal.

The distinction arises in multiclass prediction settings. A static set of classes is given by a set of labels {1,…,k} and the goal is generally to choose the most likely label given features. The static approach is the one that we typically analyze and think about in machine learning.

The dynamic setting is one that is often used in practice. The basic idea is that the number of classes is not fixed, varying on a per example basis. These different classes are generally defined by a choice of features.

The distinction between these two settings as far as theory goes, appears to be very substantial. For example, in the static setting, in learning reductions land, we have techniques now for robust O(log(k)) time prediction in many multiclass setting variants. In the dynamic setting, the best techniques known are O(k), and furthermore this exponential gap may be essential, at least without further assumptions.

Are there techniques for converting from dynamic multiclass to static multiclass? For example, we could embed a dynamic set of classes within a much larger static set ranging over all possible dynamic classes while eliminating all class-dependent features. In some cases, this approach may work well, but I’ve also seen it fail, with the basic problem being that a learning algorithm might easily choose an invalid class. We could of course force a learning algorithm to choose amongst the dynamically valid set, but I don’t know a general way to do that without making the running time at least scale with the number of valid classes.

So, a basic question that’s bothering me is: When and how can we effectively predict amongst a set of dynamically defined classes in sublinear time? A quick answer is “it’s not possible because simply reading off the set of dynamically defined classes require O(class count) time”. This answer isn’t satisfying, because there are many ways to implicitly specify a set in sublinear time. So the modified question is “Are there natural ways to dynamically define classes in sublinear time? And can these be used for sublinear prediction?”