Observations on Linearity for Reductions to Regression

Dean Foster and Daniel Hsu had a couple observations about reductions to regression that I wanted to share. This will make the most sense for people familiar with error correcting output codes (see the tutorial, page 11).

Many people are comfortable using linear regression in a one-against-all style, where you try to predict the probability of choice i vs other classes, yet they are not comfortable with more complex error correcting codes because they fear that they create harder problems. This fear turns out to be mathematically incoherent under a linear representation: comfort in the linear case should imply comfort with more complex codes.

In particular, If there exists a set of weight vectors wi such that P(i|x)= <wi,x>, then for any invertible error correcting output code C, there exists weight vectors wc which decode to perfectly predict the probability of each class. The proof is simple and constructive: the weight vector wc can be constructed according to the linear superposition of wi implied by the code, and invertibility implies that a correct encoding implies a correct decoding.

This observation extends to all-pairs like codes which compare subsets of choices to subsets of choices using “don’t cares”.

Using this observation, Daniel created a very short proof of the PECOC regret transform theorem (here, and Daniel’s updated version).

One further observation is that under ridge regression (a special case of linear regression), for any code, there exists a setting of parameters such that you might as well use one-against-all instead, because you get the same answer numerically. The implication is that the advantages of codes more complex than one-against-all is confined to other prediction methods.

ICML Reviewing Criteria

Michael Littman and Leon Bottou have decided to use a franchise program chair approach to reviewing at ICML this year. I’ll be one of the area chairs, so I wanted to mention a few things if you are thinking about naming me.

  1. I take reviewing seriously. That means papers to be reviewed are read, the implications are considered, and decisions are only made after that. I do my best to be fair, and there are zero subjects that I consider categorical rejects. I don’t consider several arguments for rejection-not-on-the-merits reasonable.
  2. I am generally interested in papers that (a) analyze new models of machine learning, (b) provide new algorithms, and (c) show that they work empirically on plausibly real problems. If a paper has the trifecta, I’m particularly interested. With 2 out of 3, I might be interested. I often find papers with only one element harder to accept, including papers with just (a).
  3. I’m a bit tough. I rarely jump-up-and-down about a paper, because I believe that great progress is rarely made. I’m not very interested in new algorithms with the same theorems as older algorithms. I’m also cautious about new analysis for older algorithms, since I like to see analysis driving algorithm rather than vice-versa. I prioritize a proof-of-possibility over a quantitative improvement. I consider quantitative improvements of small constant factors in sample complexity significant. For computationaly complexity, I generally want to see at least an order of magnitude improvement. I generally disregard any experiments on toy data, because I’ve found that toy data and real data can too-easily differ in their behavior.
  4. My personal interests are pretty well covered by existing papers, but this is perhaps not too important a criteria, compared to the above, as I easily believe other subjects are interesting.

A Healthy COLT

A while ago, we discussed the health of COLT. COLT 2008 substantially addressed my concerns. The papers were diverse and several were interesting. Attendance was up, which is particularly notable in Europe. In my opinion, the colocation with UAI and ICML was the best colocation since 1998.

And, perhaps best of all, registration ended up being free for all students due to various grants from the Academy of Finland, Google, IBM, and Yahoo.

A basic question is: what went right? There seem to be several answers.

  1. Cost-wise, COLT had sufficient grants to alleviate the high cost of the Euro and location at a university substantially reduces the cost compared to a hotel.
  2. Organization-wise, the Finns were great with hordes of volunteers helping set everything up. Having too many volunteers is a good failure mode.
  3. Organization-wise, it was clear that all 3 program chairs were cooperating in designing the program.
  4. Facilities-wise, proximity in time and space made the colocation much more real than many others have been in the past.
  5. Program-wise, COLT notably had two younger program chairs, Tong and Rocco, which seemed to work well.