A Bumper Crop of Machine Learning Graduates

My impression is that this is a particularly strong year for machine learning graduates. Here’s my short list of the strong graduates I know. Analpha (for perversity’s sake) by last name:

  1. Jenn Wortmann. When Jenn visited us for the summer, she had one, two, three, four papers. That is typical—she’s smart, capable, and follows up many directions of research. I believe approximately all of her many papers are on different subjects.
  2. Ruslan Salakhutdinov. A Science paper on bijective dimensionality reduction, mastered and improved on deep belief nets which seems like an important flavor of nonlinear learning, and in my experience he’s very fast, capable and creative at problem solving.
  3. Marc’Aurelio Ranzato. I haven’t spoken with Marc very much, but he had a great visit at Yahoo! this summer, and has an impressive portfolio of applications and improvements on convolutional neural networks and other deep learning algorithms.
  4. Lihong Li. Lihong developed the KWIK (“Knows what it Knows”) learning framework, for analyzing and creating uncertainty-aware learning algorithms. New mathematical models of learning are rare, and the topic is of substantial interest, so this is pretty cool. He’s also worked on a wide variety of other subjects and in my experience is broadly capable.
  5. Steve Hanneke: When the chapter on active learning is written in a machine learning textbook, I expect the disagreement coefficient to be in it. Steve’s work is strongly distinguished from his adviser’s, so he is guaranteed capable of independent research.

There are a couple others such as Daniel and Jake for whom I’m unsure of their graduation plans, although they have already done good work. In addition, I’m sure there are several others that I don’t know—feel free to mention others I don’t know in comments.

It’s traditional to imagine that one is best overall for hiring purposes, but I have substantial difficulty with that—the field of ML is simply to broad. Instead, if you are interested in hiring, each should be considered in your context.

Observations on Linearity for Reductions to Regression

Dean Foster and Daniel Hsu had a couple observations about reductions to regression that I wanted to share. This will make the most sense for people familiar with error correcting output codes (see the tutorial, page 11).

Many people are comfortable using linear regression in a one-against-all style, where you try to predict the probability of choice i vs other classes, yet they are not comfortable with more complex error correcting codes because they fear that they create harder problems. This fear turns out to be mathematically incoherent under a linear representation: comfort in the linear case should imply comfort with more complex codes.

In particular, If there exists a set of weight vectors wi such that P(i|x)= <wi,x>, then for any invertible error correcting output code C, there exists weight vectors wc which decode to perfectly predict the probability of each class. The proof is simple and constructive: the weight vector wc can be constructed according to the linear superposition of wi implied by the code, and invertibility implies that a correct encoding implies a correct decoding.

This observation extends to all-pairs like codes which compare subsets of choices to subsets of choices using “don’t cares”.

Using this observation, Daniel created a very short proof of the PECOC regret transform theorem (here, and Daniel’s updated version).

One further observation is that under ridge regression (a special case of linear regression), for any code, there exists a setting of parameters such that you might as well use one-against-all instead, because you get the same answer numerically. The implication is that the advantages of codes more complex than one-against-all is confined to other prediction methods.

ICML Reviewing Criteria

Michael Littman and Leon Bottou have decided to use a franchise program chair approach to reviewing at ICML this year. I’ll be one of the area chairs, so I wanted to mention a few things if you are thinking about naming me.

  1. I take reviewing seriously. That means papers to be reviewed are read, the implications are considered, and decisions are only made after that. I do my best to be fair, and there are zero subjects that I consider categorical rejects. I don’t consider several arguments for rejection-not-on-the-merits reasonable.
  2. I am generally interested in papers that (a) analyze new models of machine learning, (b) provide new algorithms, and (c) show that they work empirically on plausibly real problems. If a paper has the trifecta, I’m particularly interested. With 2 out of 3, I might be interested. I often find papers with only one element harder to accept, including papers with just (a).
  3. I’m a bit tough. I rarely jump-up-and-down about a paper, because I believe that great progress is rarely made. I’m not very interested in new algorithms with the same theorems as older algorithms. I’m also cautious about new analysis for older algorithms, since I like to see analysis driving algorithm rather than vice-versa. I prioritize a proof-of-possibility over a quantitative improvement. I consider quantitative improvements of small constant factors in sample complexity significant. For computationaly complexity, I generally want to see at least an order of magnitude improvement. I generally disregard any experiments on toy data, because I’ve found that toy data and real data can too-easily differ in their behavior.
  4. My personal interests are pretty well covered by existing papers, but this is perhaps not too important a criteria, compared to the above, as I easily believe other subjects are interesting.