December 2005 – Machine Learning (Theory)

12/29/200512/29/2005

Deadline Season

Many different paper deadlines are coming up soon so I made a little reference table. Out of curiosity, I also computed the interval between submission deadline and conference.

Conference	Location	Date	Deadline	interval
COLT	Pittsburgh	June 22-25	January 21	152
ICML	Pittsburgh	June 26-28	January 30/February 6	140
UAI	MIT	July 13-16	March 9/March 16	119
AAAI	Boston	July 16-20	February 16/21	145
KDD	Philadelphia	August 23-26	March 3/March 10	166

It looks like the northeastern US is the big winner as far as location this year.

12/28/200512/28/2005

Yet more nips thoughts

I only managed to make it out to the NIPS workshops this year so
I’ll give my comments on what I saw there.

The Learing and Robotics workshops lives again. I hope it
continues and gets more high quality papers in the future. The
most interesting talk for me was Larry Jackel’s on the LAGR
program (see John’s previous post on said program). I got some
ideas as to what progress has been made. Larry really explained
the types of benchmarks and the tradeoffs that had to be made to
make the goals achievable but challenging.

Hal Daume gave a very interesting talk about structured
prediction using RL techniques, something near and dear to my own
heart. He achieved rather impressive results using only a very
greedy search.

The non-parametric Bayes workshop was great. I enjoyed the entire
morning session I spent there, and particularly (the usually
desultory) discussion periods. One interesting topic was the
Gibbs/Variational inference divide. I won’t try to summarize
especially as no conclusion was reached. It was interesting to
note that samplers are competitive with the variational
approaches for many Dirichlet process problems. One open question
I left with was whether the fast variants of Gibbs sampling could
be made multi-processor as the naive variants can.

I also have a reading list of sorts from the main
conference. Most of the papers mentioned in previous posts on
NIPS are on that list as well as these: (in no particular order)

The Information-Form Data Association Filter
Sebastian Thrun, Brad Schumitsch, Gary Bradski, Kunle Olukotun
[ps.gz][pdf][bibtex]

Divergences, surrogate loss functions and experimental design
XuanLong Nguyen, Martin Wainwright, Michael Jordan [ps.gz][pdf][bibtex]

Generalization to Unseen Cases
Teemu Roos, Peter GrÃƒÂ¼nwald, Petri MyllymÃƒÂ¤ki, Henry Tirri [ps.gz][pdf][bibtex]

Gaussian Process Dynamical Models
David Fleet, Jack Wang, Aaron Hertzmann [ps.gz][pdf][bibtex]

Convex Neural Networks
Yoshua Bengio, Nicolas Le Roux, Pascal Vincent, Olivier Delalleau,
Patrice Marcotte [ps.gz][pdf][bibtex]

Describing Visual Scenes using Transformed Dirichlet Processes
Erik Sudderth, Antonio Torralba, William Freeman, Alan Willsky
[ps.gz][pdf][bibtex]

Learning vehicular dynamics, with application to modeling helicopters
Pieter Abbeel, Varun Ganapathi, Andrew Ng [ps.gz][pdf][bibtex]

Tensor Subspace Analysis
Xiaofei He, Deng Cai, Partha Niyogi [ps.gz][pdf][bibtex]

12/27/200512/27/2005

Automated Labeling

One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this:

Unsupervised Learning. You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about.
Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way.
Active Learning. You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy.

It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows:

Identify some subset of observed values to predict from the others.
Build a predictor.
Use the output of the predictor to define a new prediction problem.
Repeat…

Examples of this sort seem to come up in robotics very naturally. An extreme version of this is:

Predict nearby things given touch sensor output.
Predict medium distance things given the nearby predictor.
Predict far distance things given the medium distance predictor.

Some of the participants in the LAGR project are using this approach.

A less extreme version was the DARPA grand challenge winner where the output of a laser range finder was used to form a road-or-not predictor for a camera image.

These automated labeling techniques transform an unsupervised learning problem into a supervised learning problem, which has huge implications: we understand supervised learning much better and can bring to bear a host of techniques.

The set of work on automated labeling is sketchy—right now it is mostly just an observed-as-useful technique for which we have no general understanding. Some relevant bits of algorithm and theory are:

Reinforcement learning to classification reductions which convert rewards into labels.
Cotraining which considers a setting containing multiple data sources. When predictors using different data sources agree on unlabeled data, an inferred label is automatically created.

It’s easy to imagine that undiscovered algorithms and theory exist to guide and use this empirically useful technique.

12/22/200512/22/2005

Yes , I am applying

Every year about now hundreds of applicants apply for a research/teaching job with the timing governed by the university recruitment schedule. This time, it’s my turn—the hat’s in the ring, I am a contender, etc… What I have heard is that this year is good in both directions—both an increased supply and an increased demand for machine learning expertise.

I consider this post a bit of an abuse as it is neither about general research nor machine learning. Please forgive me this once.

My hope is that I will learn about new places interested in funding basic research—it’s easy to imagine that I have overlooked possibilities.

I am not dogmatic about where I end up in any particular way. Several earlier posts detail what I think of as a good research environment, so I will avoid a repeat. A few more details seem important:

Application. There is often a tension between basic research and immediate application. This tension is not as strong as might be expected in my case. As evidence, many of my coauthors of the last few years are trying to solve particular learning problems and I strongly care about whether and where a learning theory is useful in practice.
Duration. I would like my next move to be of indefinite duration.

Feel free to email me (jl@hunch.net) if there is a possibility you think I should consider.

12/17/200512/17/2005

Workshops as Franchise Conferences

Founding a successful new conference is extraordinarily difficult. As a conference founder, you must manage to attract a significant number of good papers—enough to entice the participants into participating next year and to (generally) to grow the conference. For someone choosing to participate in a new conference, there is a very significant decision to make: do you send a paper to some new conference with no guarantee that the conference will work out? Or do you send it to another (possibly less related) conference that you are sure will work?

The conference founding problem is a joint agreement problem with a very significant barrier. Workshops are a way around this problem, and workshops attached to conferences are a particularly effective means for this. A workshop at a conference is sure to have people available to speak and attend and is sure to have a large audience available. Presenting work at a workshop is not generally exclusive: it can also be presented at a conference. For someone considering participation, the only overhead is the direct time and effort involved in participation.

All of the above says that workshops are much easier than conferences, but it does not address a critical question: “Why run a workshop at a conference rather than just a session at the conference?” A session at the conference would have all the above advantages.

There is one more very signficant and direct advantage of a workshop over a special session: workshops are run by people who have a direct and significant interest in their success. The workshop organizers do the hard work of developing a topic, soliciting speakers, and deciding what the program will be. Reputations for the workshop organizer are then built on the success or flop of the workshop. This “direct and signficant interest” aspect of a workshop is the basic reason why franchise systems (think 7-11 or McDonalds) are common and successful.

What does this observation imply about how things could be? For example, we could imagine a conference that is “all workshops”. Instead of having a program committee and program chair, the conference might just have a program chair that accepts or rejects workshop chairs who then organize their own workshop/session. This mode doesn’t seem to exist which is always cautioning, but on the other hand it ‘s not clear this mode has even been tried. NIPS is probably the conference closest to using this approach. For example, a significant number of people attend only the workshops at NIPS.