Workshops are not Conferences

… and you should use that fact.

A workshop differs from a conference in that it is about a focused group of people worrying about a focused topic. It also differs in that a workshop is typically a “one-time affair” rather than a series. (The Snowbird learning workshop counts as a conference in this respect.)

A common failure mode of both organizers and speakers at a workshop is to treat it as a conference. This is “ok”, but it is not really taking advantage of the situation. Here are some things I’ve learned:

  1. For speakers: A smaller audience means it can be more interactive. Interactive means a better chance to avoid losing your audience and a more interesting presentation (because you can adapt to your audience). Greater focus amongst the participants means you can get to the heart of the matter more easily, and discuss tradeoffs more carefully. Unlike conferences, relevance is more valued than newness.
  2. For organizers: Not everything needs to be in a conference style presentation format (i.e. regularly spaced talks of 20-30 minute duration). Significant (and variable) question time, different talk durations, flexible rescheduling, and panel discussions can all work well.

Question: “When is the right time to insert the loss function?”

Hal asks a very good question: “When is the right time to insert the loss function?” In particular, should it be used at testing time or at training time?

When the world imposes a loss on us, the standard Bayesian recipe is to predict the (conditional) probability of each possibility and then choose the possibility which minimizes the expected loss. In contrast, as the confusion over “loss = money lost” or “loss = the thing you optimize” might indicate, many people ignore the Bayesian approach and simply optimize their loss (or a close proxy for their loss) over the representation on the training set.

The best answer I can give is “it’s unclear, but I prefer optimizing the loss at training time”. My experience is that optimizing the loss in the most direct manner possible typically yields best performance. This question is related to a basic principle which both Yann LeCun(applied) and Vladimir Vapnik(theoretical) advocate: “solve the simplest prediction problem that solves the problem”. (One difficulty with this principle is that ‘simplest’ is difficult to define in a satisfying way.)

One reason why it’s unclear is that optimizing an arbitrary loss is not an easy thing for a learning algorithm to cope with. Learning reductions (which I am a big fan of) give a mechanism for doing this, but they are new and relatively untried.

Drew Bagnell adds: Another approach to integrating loss functions into learning is to try to re-derive ideas about probability theory appropriate for other loss functions. For instance, Peter Grunwald and A.P. Dawid present a variant on maximum entropy learning. Unfortunately, it’s even less clear how often these approaches lead to efficient algorithms.

Exact Online Learning for Classification

Jacob Abernethy and I have found a computationally tractable method for computing an optimal (or near optimal depending on setting) master algorithm combining expert predictions addressing this open problem. A draft is here.

The effect of this improvement seems to be about a factor of 2 decrease in the regret (= error rate minus best possible error rate) for the low error rate situation. (At large error rates, there may be no significant difference.)

There are some unfinished details still to consider:

  1. When we remove all of the approximation slack from online learning, is the result a satisfying learning algorithm, in practice? I consider online learning is one of the more compelling methods of analyzing and deriving algorithms, but that expectation must be either met or not by this algorithm
  2. Some extra details: The algorithm is optimal given a small amount of side information (k in the draft). What is the best way to remove this side information? The removal is necessary for a practical algorithm. One mechanism may be the k->infinity limit.

Bad ideas

I found these two essays on bad ideas interesting. Neither of these is written from the viewpoint of research, but they are both highly relevant.

  1. Why smart people have bad ideas by Paul Graham
  2. Why smart people defend bad ideas by Scott Berkun (which appeared on slashdot)

In my experience, bad ideas are common and over confidence in ideas is common. This overconfidence can take either the form of excessive condemnation or excessive praise. Some of this is necessary to the process of research. For example, some overconfidence in the value of your own research is expected and probably necessary to motivate your own investigation. Since research is a rather risky business, much of it does not pan out. Learning to accept when something does not pan out is a critical skill which is sometimes never acquired.

Excessive condemnation can be a real ill when it’s encountered. This has two effects:

  1. When the penalty for being wrong is too large, it means people have a great investment in defending “their” idea. Since research is risky, “their” idea is often wrong (or at least in need of amendment).
  2. A large penalty implies people are hesitant to introduce new ideas.

Both of these effects slow the progress of research. How much, exactly, is unclear and very difficult to imagine measuring.

While it may be difficult to affect the larger community of research, you can and should take these considerations into account when choosing coauthors, advisors, and other people you work with. The ability to say “oops, I was wrong”, have that be accepted without significant penalty, and move on is very valuable for the process of thinking.

Running A Machine Learning Summer School

We just finished the Chicago 2005 Machine Learning Summer School. The school was 2 weeks long with about 130 (or 140 counting the speakers) participants. For perspective, this is perhaps the largest graduate level machine learning class I am aware of anywhere and anytime (previous MLSSs have been close). Overall, it seemed to go well, although the students are the real authority on this. For those who missed it, DVDs will be available from our Slovenian friends. Email Mrs Spela Sitar of the Jozsef Stefan Institute for details.

The following are some notes for future planning and those interested.
Good Decisions

  1. Acquiring the larger-than-necessary “Assembly Hall” at International House. Our attendance came in well above our expectations, so this was a critical early decision that made a huge difference.
  2. The invited speakers were key. They made a huge difference in the quality of the content.
  3. Delegating early and often was important. One key difficulty here is gauging how much a volunteer can (or should) do. Many people are willing to help a little, so breaking things down into small chunks is important.

Unclear Decisions

  1. Timing (May 16-27, 2005): We wanted to take advantage of the special emphasis on learning quarter here. We also wanted to run the summer school in the summer. These goals did not have a good solution. By starting as late as possible in the quarter, we were in the “summer” for universities on a semester schedule but not those on a quarter schedule. Thus, we traded some students and scheduling conflicts at University of chicago for the advantages of the learning quarter.
  2. Location (Hyde Park, Chicago):
    Advantages:

    1. Easy to fly to.
    2. Easy to get funding. (TTI and Uchicago were both significant contributors.)
    3. Easy (on-site) organization.

    Disadvantages:

    1. US visas were too slow or rejected 7+ students.
    2. Location in Chicago implied many locals drifted in and out.
    3. The Hyde Park area lacks real hotels, creating housing difficulties.
  3. Workshop colocation: We colocated with two workshops. The advantage of this is more content. The disadvantage was that it forced talks to start relatively early. This meant that attendance at the start of the first lecture was relatively low (60-or-so), ramping up through the morning. Although some students benefitted from the workshop talks, most appeared to gain much more from the summer school.

Things to do Differently Next Time

  1. Delegate harder and better. Doing various things rather than delegating means you feel like you are “doing your part”, but it also means that you are distracted and do not see other things which need to be done….and they simply don’t get done unless you see it.
  2. Have a ‘sorting session’. With 100+ people in the room, it is difficult to meet people of similar interests. This should be explicitly aided. One good suggestion is “have a poster session for any attendees”. Sorting based on other dimensions might also be helpful. The wiki helped here for social events.
  3. Torture the speakers more. Presenting an excess of content in a minimum of time to an audience of diverse backgrounds is extremely difficult. This difficulty can not be avoided, but it can be ameliorated. Having presentation slides and suggested reading well in advance helps. The bad news here is that it is very difficult to get speakers to make materials available in advance. They naturally want to tweak slides at the last minute and include the newest cool discoveries.
  4. Schedules posted at the entrance.

The Future There will almost certainly be future machine learning summer schools in the series and otherwise. My impression is that the support due to being “in series” is not critical to success, but it is considerable. For those interested, running one “in series” starts with a proposal consisting of {organizers,time/location,proposed speakers,budget} sent to Alex Smola and Bernhard Schoelkopf. I am sure they are busy, so conciseness is essential.