July 2007 – Machine Learning (Theory)

7/28/2007

Asking questions

There are very substantial differences in how question asking is viewed culturally. For example, all of the following are common:

If no one asks a question, then no one is paying attention.
To ask a question is disrespectful of the speaker.
Asking a question is admitting your own ignorance.

The first view seems to be the right one for research, for several reasons.

Research is quite hard—it’s difficult to guess how people won’t understand something in advance while preparing a presentation. Consequently, it’s very common to lose people. No worthwhile presenter wants that.
Real understanding is precious. By asking a question, you are really declaring “I want to understand”, and everyone should respect that.
Asking a question wakes you up. I don’t mean from “asleep” to “awake” but from “awake” to “really awake”. It’s easy to drift through something sort-of-understanding. When you ask a question, especially because you are on the spot, you will do much better.

Some of these effects might seem minor, but they accumulate over time, and their accumulation can have a big effect in terms of knowledge and understanding by the questioner as well as how well ideas are communicated. A final piece of evidence comes from checking cultural backgrounds. People with a cultural background that accepts question asking simply tend to do better in research. If this isn’t you, it’s ok. By being conscious of the need to ask questions and working up the courage to do it, you can do fine.

A reasonable default is that the time to not ask a question is when the speaker (or the environment) explicitly say “let me make progress in the talk”.

7/20/2007

Motivation should be the Responsibility of the Reviewer

The prevailing wisdom in machine learning seems to be that motivating a paper is the responsibility of the author. I think this is a harmful view—instead, it’s healthier for the community to regard this as the responsibility of the reviewer.

There are lots of reasons to prefer a reviewer-responsibility approach.

Authors are the most biased possible source of information about the motivation of the paper. Systems which rely upon very biased sources of information are inherently unreliable.
Authors are highly variable in their ability and desire to express motivation for their work. This adds greatly to variance on acceptance of an idea, and it can systematically discriminate or accentuate careers. It’s great if you have a career accentuated by awesome wording choice, but wise decision making by reviewers is important for the field.
The motivation section in a paper doesn’t do anything in some sense—it’s there to get the paper in. Reading the motivation of a paper is of little use in helping the reader solve new problems.
Many motivation sections are a waste of time. The 30th paper on a subject should not require a motivation as if it’s the first paper on a subject, and requiring or expecting this of authors is an exercise in busy work by the research community.

Some caveats to make sure I’m understood:

I’m not advocating the complete removal of a motivation section (motivectomy?), which would be absurd (and frankly harmful to your career). A paragraph describing common examples where the problem addressed comes up is desirable for readers who are not specialists. This paragraph should not be in the abstract, where it seems to often sneak in.
I’m also not arguing against discussion of motivations. I regard discussion of motivations as quite important, and totally unsuited to the paper format. It’s hard to imagine any worse method for discussion than one with a year-size latency where quasi-anonymous people are quasi-randomly paired and each attempts to accomplish several different tasks one of which happens to be a one-sided discussion of motivation. A blog can work much better for this sort of thing, and I definitely invite discussion on motivational questions.

So, how do we change the prevailing wisdom? The answer is always “gradually”, but there are a number of steps we can take.

As an author, one clever technique is to pass serious discussion of motivation by reference. “For a general discussion and motivation of this problem see [].” This would save space in the large number of papers which attempt to address an old problem better than previous approaches.
Participate in public discussion of motivations. We need to encourage a real mechanism for discussion. Until these alternative (and far better) formats for discussion are developed the problem of “who motivates” will always exist.
Have private discussions about motivation where you can. Random conversations at conferences are great for this, and the process often sharpens your appreciation.
Learn to take responsibility for motivation as a reviewer. This might sound hard, but it’s actually somewhat easier than careful evaluation of technical content in my experience.
1. The first step is to disbelieve all the motivational parts of a paper by default. As mentioned above, the authors are not a reliable source anyways. Skip it and move on.
2. Make sure you understand the problem being addressed.
3. Make sure you understand how well the problem is addressed, relative to previous work.
4. Think about how important that increment is. This is not equivalent to asking “how many people will appreciate the increment?” which is a popularity question. Frankly, all of Machine Learning fails the popularity test in a wider sense, even though many people appreciate the fruits of machine learning on a daily basis. First, think about the problem.
  1. How many people might a solution to the problem help? 0 is fairly common amongst submitted papers.
  2. How much would it help them? If it’s “alot”, then that should add a bit to the importance of the paper.
  3. How familiar are you with the problem? If not very, then it’s appropriate to give the benefit of the doubt to the authors.
  Think about the solution.
  1. This solution might be useful to some other researchers who come up with something useful. This is a a warning sign.
  2. This solution might be useful to me in coming up with a useful algorithm for solving problems.
  3. This paper improves an algorithm. This is also fairly common. It should be improving an algorithm with a reasonable claim at being the best method for solving some problem.
  4. This paper can provide improvements to many algorithms. Theory papers often fall here, but they can also fall under (1) or (2) easily.
  Now, take these considerations into account in forming your own opinion about how motivated the paper is.
Go multimodel. If you only know one model of what machine learning is, you don’t really know machine learning. Learn multiple ideas of what machine learning are, and actively consider their merits and downsides.

7/13/2007

The View From China

I’m visiting Beijing for the Pao-Lu Hsu Statistics Conference on Machine Learning.

I had several discussions about the state of Chinese research. Given the large population and economy, you might expect substantial research—more than has been observed at international conferences. The fundamental problem seems to be the Cultural Revolution which lobotimized higher education, and the research associated with it. There has been a process of slow recovery since then, which has begun to be felt in the research world via increased participation in international conferences and (now) conferences in China.

The amount of effort going into construction in Beijing is very impressive—people are literally building a skyscraper at night outside the window of the hotel I’m staying at (and this is not unusual). If a small fraction of this effort is later focused onto supporting research, the effect could be very substantial. General growth in China’s research portfolio should be expected.

7/12/2007

ICML Trends

Mark Reid did a post on ICML trends that I found interesting.

7/6/2007

Idempotent-capable Predictors

One way to distinguish different learning algorithms is by their ability or inability to easily use an input variable as the predicted output. This is desirable for at least two reasons:

Modularity If we want to build complex learning systems via reuse of a subsystem, it’s important to have compatible I/O.
“Prior” knowledge Machine learning is often applied in situations where we do have some knowledge of what the right solution is, often in the form of an existing system. In such situations, it’s good to start with a learning algorithm that can be at least as good as any existing system.

When doing classification, most learning algorithms can do this. For example, a decision tree can split on a feature, and then classify. The real differences come up when we attempt regression. Many of the algorithms we know and commonly use are not idempotent predictors.

Logistic regressors can not be idempotent, because all input features are mapped through a nonlinearity.
Linear regressors can be idempotent—they just set the weight on one input feature to 1 and other features to 0.
Regression trees are not idempotent, or (at least) not easily idempotent. In order to predict the same as an input feature, that input feature must be split many times.
Bayesian approaches may or may not be easily idempotent, depending on the structure of the Bayesian Prior.

It isn’t clear how important the idempotent-capable property is. Successive approximation approaches such as boosting can approximate it in a fairly automatic maner. It may be of substantial importance for large modular systems where efficiency is important.