John Langford – Page 91 – Machine Learning (Theory)

3/17/20053/17/2005

Going all the Way, Sometimes

At many points in research, you face a choice: should I keep on improving some old piece of technology or should I do something new? For example:

Should I refine bounds to make them tighter?
Should I take some learning theory and turn it into a learning algorithm?
Should I implement the learning algorithm?
Should I test the learning algorithm widely?
Should I release the algorithm as source code?
Should I go see what problems people actually need to solve?

The universal temptation of people attracted to research is doing something new. That is sometimes the right decision, but is also often not. I’d like to discuss some reasons why not.

Expertise Once expertise are developed on some subject, you are the right person to refine them.
What is the real problem? Continually improving a piece of technology is a mechanism forcing you to confront this question. In many cases, this confrontation is uncomfortable because you discover that your method has fundamental flaws with respect to solving the real problem. Not going all the way means you never discover this, except the hard way—people lose interest in your work.
Virtues of breadth When you go all the way, you gain breadth, with a deeper understanding about which problems are important and why. This can be invaluable in focusing your future research.
More Tangible Accomplishment Going all the way means that you can point to your peers and say “I solved it”.

Going all the way is sometimes problematic in research. For example, a paper with a theory, an algorithm, and experimental results invites defeat-in-detail: a reviewer can disagree with any one of these components and eliminate it from consideration. Another issue is that academia doesn’t directly reward implementing and releasing algorithms. A third issue is that you will almost certainly discover topics of interest which don’t fit your home conference(s). It is also very difficult to publish a paper with the title “an incremental improvement on X (which makes it work great in practice)”.

Along with this advice, it is important to remember to fail fast, where appropriate. When you discover that an idea is not workable, quickly quitting it and moving on is a real virtue.

3/15/20053/15/2005

The State of Tight Bounds

What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals.
Why?

Good Judgement. In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not.
Learning Essence. The form of some of these bounds helps you understand what the essence of learning is.
Algorithm Design. Some of these bounds suggest, motivate, or even directly imply learning algorithms.

What We Know Now

There are several families of bounds, based on how information is used.

Testing Bounds. These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound, progressive validation also here and here, train and test bounds, and cross-validation (but see the big open problem). These techniques are the best available for goal (1) above, but provide little information towards goals (2) or (3). Some of these techniques are computationally efficient while others are not.
Unlabeled test set bounds Instead of using labeled data to construct a tight bound, it is sometimes possible to use unlabeled data.
Training Bounds These are methods which use labeled data to for both training and testing. These bounds provide insight into goals (2) and (3), but are not very effective for goal (1) above. These bounds teach us about how many independent examples are required to guarantee learning success on different on different representations. Any bound of this sort implies a learning algorithm: “minimize the bound”. The set covering machine is an application of this approach to a (variant) sample compression bound. Most other potential applications aren’t “quite there” yet, although they are close. Here is a list of learning algorithms and bounds that “almost work” for these algorithms.
1. SVM PAC-Bayes margin bound.
2. Perceptron PAC-Bayes margin bound, Sample Compression Bound
3. Neural Network PAC-Bayes bound.
4. Decision Tree Occam’s razor bound.
5. Decision Tree Pruning Rademacher Complexity Bounds
6. Semisupervised Clustering PAC-MDL or transductive PAC-Bayes bound
7. Nearest neighbor ??(Note that the cross validation approaches work well here.)

Limitations
The independence assumption is a significant limitation in the applicability of this approach since we often can not believe that independence is satisfied on natural learning problems. Some work has gone into weakening this assumption.

3/13/20053/13/2005

Avoiding Bad Reviewing

If we accept that bad reviewing often occurs and want to fix it, the question is “how”?

Reviewing is done by paper writers just like yourself, so a good proxy for this question is asking “How can I be a better reviewer?” Here are a few things I’ve learned by trial (and error), as a paper writer, and as a reviewer.

The secret ingredient is careful thought. There is no good substitution for a deep and careful understanding.
Avoid reviewing papers that you feel competitive about. You almost certainly will be asked to review papers that feel competitive if you work on subjects of common interest. But, the feeling of competition can easily lead to bad judgement.
If you feel biased for some other reason, then you should avoid reviewing. For example…
Feeling angry or threatened by a paper is a form of bias. See above.
Double blind yourself (avoid looking at the name even in a single-blind situation). The significant effect of a name you recognize is making you pay close attention to a paper. Since not paying enough attention is a standard failure mode of reviewers, a name you recognize is inevitably unfair to authors you do not recognize.
Don’t review fast. For conferences there is a tendency to review papers right at the deadline. This tendency can easily result in misjudgements because you do not have the opportunity to really understand what a paper is saying.
Don’t review too much. “Too much” is defined on a per-person basis. If you don’t have time to really understand the papers that you review, then you should say “no” to review requests.
Overconfidence is the enemy of truth. If you are not confident about your review, you should not state that you are. Bad reviews are often overconfident reviews.
Always try to make review comments nonpersonal and constructive, especially in a rejection.

Given the above, observations, a few suggestions for improved review organization can be derived.

Double blind. A common argument against double blind reviewing is that it is often defeatable. This is correct and misses the point. The reason why double blind reviewing is helpful is that a typical reviewer who wants to review well is aided by the elimination of side information which should not effect the acceptance of a paper. (ICML and AAAI are double blind this year.) Another reason why double blind reviewing is “right”, is that it simply appears fairer. This makes it easier on average for authors to take rejections in a more constructive manner.
Staggered deadlines. Many people can’t prioritize reviews well, so the prioritization defaults to deadline proximity. Consequently, instead of having many paper reviews due on one day, having them due at the rate of one-per-day (or an even slower rate) may be helpful. These should be real deadlines in the sense that “you get it in by this date or you are excluded from conversation and decision making about the paper”.
Large PCs. There is a tendency to value (and romanticize) the great researcher. But a great researcher with many papers to review can only be a mediocre reviewer due to lack of available attention and time. Consequently, increasing the size of the PC may be helpful for small PC conferences.
Communication channels. A typical issue in reviewing a paper is that some detail is unintentionally (and accidentally) unclear. In this case, being able to communicate with the authors is helpful. This communication can be easily setup to respect the double blind guarantee by routing through the conference site. This communication does not change the meaning of a reviewers job. ICML and AAAI are allowing author feedback. I mean something more spontaneous, but this is a step in that direction.
Refusal. In many cases, it is not possible to tell that you have a conflict of interest in a paper until after seeing it. A mechanism for saying “I have a conflict of interest, please reassign the paper” should exist, and it’s use should be respected.
Independence. Access to other reviews should not be available until after completing your own review. The point of having multiple reviews is reducing noise. Allowing early access to other reviews increases noise by decreasing independence amongst reviewers. Many conferences (but not all) follow this pattern.

If you have more ideas, please add them.

3/10/20053/10/2005

Breaking Abstractions

Sam Roweis‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break.

Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers.
The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit.
Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work.
The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fundamental randomization. From the Bayesian viewpoint, precisely specifying our lack of knowledge is extremely difficult and typically not done.

So, what should we do when we learn that our basic tools can break? The answer, of course is to keep using them until something better comes along. However, the uncomfortable knowledge our tools break is necessary in a few ways:

When considering a new abstraction, the existence of a break does not imply that it is a useless abstraction. (Just as the existence of the breaks above does not imply that they are useless abstractions.)
When using an abstraction in some new way, we must generally consider “is this a reasonable use?”
Some tools break more often than other tools. We should actively consider the “rate of breakage” when deciding amongst tools.

3/9/20053/9/2005

Bad Reviewing

This is a difficult subject to talk about for many reasons, but a discussion may be helpful.

Bad reviewing is a problem in academia. The first step in understanding this is admitting to the problem, so here is a short list of examples of bad reviewing.

Reviewer disbelieves theorem proof (ICML), or disbelieve theorem with a trivially false counterexample. (COLT)
Reviewer internally swaps quantifiers in a theorem, concludes it has been done before and is trivial. (NIPS)
Reviewer believes a technique will not work despite experimental validation. (COLT)
Reviewers fail to notice flaw in theorem statement (CRYPTO).
Reviewer erroneously claims that it has been done before (NIPS, SODA, JMLR)—(complete with references!)
Reviewer inverts the message of a paper and concludes it says nothing important. (NIPS*2)
Reviewer fails to distinguish between a DAG and a tree (SODA).
Reviewer is enthusiastic about paper but clearly does not understand (ICML).
Reviewer erroneously believe that the “birthday paradox” is relevant (CCS).

The above is only for cases where there was sufficient reviewer comments to actually understand reviewer failure modes. Many reviewers fail to leave sufficient comments and it’s easy to imagine they commit similar mistakes.

Bad reviewing should be clearly distinguished from rejections—note that some of the above examples are actually accepts.

The standard psychological reaction to any rejected paper is trying to find fault with the reviewers. You, as a paper writer, have invested significant work (weeks? months? years?) in the process of creating a paper, so it is extremely difficult to step back and read the reviews objectively. One distinguishing characteristic of a bad review from a rejection is that it bothers you years later.

If we accept that bad reviewing happens and want to address the issue, we are left with a very difficult problem. Many smart people have thought about improving this process, yielding the system we observe now. There are many subtle issues here and several solutions that (naively) appear obvious don’t work.