The Role of Workshops

A good workshop is often far more interesting than the papers at a conference. This happens because a workshop has a much tighter focus than a conference. Since you choose the workshops fitting your interest, the increased relevance can greatly enhance the level of your interest and attention. Roughly speaking, a workshop program consists of elements related to a subject of your interest. The main conference program consists of elements related to someone’s interest (which is rarely your own). Workshops are more about doing research while conferences are more about presenting research.

Several conferences have associated workshop programs, some with deadlines due shortly.

ICML workshops Due April 1
IJCAI workshops Deadlines Vary
KDD workshops Not yet finalized

Anyone going to these conferences should examine the workshops and see if any are of interest. (If none are, then maybe you should organize one next year.)

Research Styles in Machine Learning

Machine Learning is a field with an impressively diverse set of reseearch styles. Understanding this may be important in appreciating what you see at a conference.

  1. Engineering. How can I solve this problem? People in the engineering research style try to solve hard problems directly by any means available and then describe how they did it. This is typical of problem-specific conferences and communities.
  2. Scientific. What are the principles for solving learning problems? People in this research style test techniques on many different problems. This is fairly common at ICML and NIPS.
  3. Mathematical. How can the learning problem be mathematically understood? People in this research style prove theorems with implications for learning but often do not implement (or test algorithms). COLT is a typical conference for this style.

Many people manage to cross these styles, and that is often beneficial.

Whenver we list a set of alternative, it becomes natural to think “which is best?” In this case of learning it seems that each of these styles is useful, and can lead to new useful discoveries. I sometimes see failures to appreciate the other approaches, which is a shame.

Binomial Weighting

Suppose we have a set of classifiers c making binary predictions from an input x and we see examples in an online fashion. In particular, we repeatedly see an unlabeled example x, make a prediction y’(possibly based on the classifiers c), and then see the correct label y.

When one of these classifiers is perfect, there is a great algorithm available: predict according to the majority vote over every classifier consistent with every previous example. This is called the Halving algorithm. It makes at most log2 |c| mistakes since on any mistake, at least half of the classifiers are eliminated.

Obviously, we can’t generally hope that the there exists a classifier which never errs. The Binomial Weighting algorithm is an elegant technique allowing a variant Halving algorithm to cope with errors by creating a set of virtual classifiers for every classifier which occasionally disagree with the original classifier. The Halving algorithm on this set of virtual classifiers satisfies a theorem of the form:

errors of binomial weighting algorithm less than minc f(number of errors of c, number of experts)

The Binomial weighting algorithm takes as a parameter the maximal minimal number of mistakes of a classifier. By introducing a “prior” over the number of mistakes, it can be made parameter free. Similarly, introducing a “prior” over the set of classifiers is easy and makes the algorithm sufficiently flexible for common use.

However, there is a problem. The minimal value of f() is 2 times the number of errors of any classifier, regardless of the number of classifiers. This is frustrating because a parameter-free learning algorithm taking an arbitrary “prior” and achieving good performance on an arbitrary (not even IID) set of examples is compelling for implementation and use, if we had a good technique for removing the factor of 2. How can we do that?

See the weighted majority algorithm for an example of a similar algorithm which can remove a factor of 2 using randomization and at the expense of introducing a parameter. There are known techniques for eliminating this parameter, but they appear not as tight (and therefore practically useful) as introducing a “prior” over the number of errors.

Going all the Way, Sometimes

At many points in research, you face a choice: should I keep on improving some old piece of technology or should I do something new? For example:

  1. Should I refine bounds to make them tighter?
  2. Should I take some learning theory and turn it into a learning algorithm?
  3. Should I implement the learning algorithm?
  4. Should I test the learning algorithm widely?
  5. Should I release the algorithm as source code?
  6. Should I go see what problems people actually need to solve?

The universal temptation of people attracted to research is doing something new. That is sometimes the right decision, but is also often not. I’d like to discuss some reasons why not.

  1. Expertise Once expertise are developed on some subject, you are the right person to refine them.
  2. What is the real problem? Continually improving a piece of technology is a mechanism forcing you to confront this question. In many cases, this confrontation is uncomfortable because you discover that your method has fundamental flaws with respect to solving the real problem. Not going all the way means you never discover this, except the hard way—people lose interest in your work.
  3. Virtues of breadth When you go all the way, you gain breadth, with a deeper understanding about which problems are important and why. This can be invaluable in focusing your future research.
  4. More Tangible Accomplishment Going all the way means that you can point to your peers and say “I solved it”.

Going all the way is sometimes problematic in research. For example, a paper with a theory, an algorithm, and experimental results invites defeat-in-detail: a reviewer can disagree with any one of these components and eliminate it from consideration. Another issue is that academia doesn’t directly reward implementing and releasing algorithms. A third issue is that you will almost certainly discover topics of interest which don’t fit your home conference(s). It is also very difficult to publish a paper with the title “an incremental improvement on X (which makes it work great in practice)”.

Along with this advice, it is important to remember to fail fast, where appropriate. When you discover that an idea is not workable, quickly quitting it and moving on is a real virtue.