Research – Page 10 – Machine Learning (Theory)

6/25/20066/25/2006

Presentation of Proofs is Hard.

When presenting part of the Reinforcement Learning theory tutorial at ICML 2006, I was forcibly reminded of this.

There are several difficulties.

When creating the presentation, the correct level of detail is tricky. With too much detail, the proof takes too much time and people may be lost to boredom. With too little detail, the steps of the proof involve too-great a jump. This is very difficult to judge.
1. What may be an easy step in the careful thought of a quiet room is not so easy when you are occupied by the process of presentation.
2. What may be easy after having gone over this (and other) proofs is not so easy to follow in the first pass by a viewer.
These problems seem only correctable by process of repeated test-and-revise.
When presenting the proof, simply speaking with sufficient precision is substantially harder than in normal conversation (where precision is not so critical). Practice can help here.
When presenting the proof, going at the right pace for understanding is difficult. When we use a blackboard/whiteboard, a natural reasonable pace is imposed by the process of writing. Unfortunately, writing doesn’t scale well to large audiences for vision reasons, losing this natural pacing mechanism.
It is difficult to entertain with a proofÃ¢â‚¬â€there is nothing particularly funny about it. This particularly matters for a large audience which tends to naturally develop an expectation of being entertained.

Given all these difficulties, it is very tempting to avoid presenting proofs. Avoiding the proof in any serious detail is fairly reasonable in a conference presentation—the time is too short and the people viewing are too heavily overloaded to follow the logic well. The “right” level of detail is often the theorem statement.

Nevertheless, avoidance is not always possible because the proof is one of the more powerful mechanisms we have for doing research.

4/17/20064/17/2006

Rexa is live

Rexa is now publicly available. Anyone can create an account and login.

Rexa is similar to Citeseer and Google Scholar in functionality with more emphasis on the use of machine learning for intelligent information extraction. For example, Rexa can automatically display a picture on an author’s homepage when the author is searched for.

4/14/20064/14/2006

JMLR is a success

In 2001, the “Journal of Machine Learning Research” was created in reaction to unadaptive publisher policies at MLJ. Essentially, with the creation of the internet, the bottleneck in publishing research shifted from publishing to research. The declaration of independence accompanying this move expresses the reasons why in greater detail.

MLJ has strongly changed its policy in reaction to this. In particular, there is no longer an assignment of copyright to the publisher (*), and MLJ regularly sponsors many student “best paper awards” across several conferences with cash prizes. This is an advantage of MLJ over JMLR: MLJ can afford to sponsor cash prizes for the machine learning community. The remaining disadvantage is that reading papers in MLJ sometimes requires searching for the author’s website where the free version is available. In contrast, JMLR articles are freely available to everyone off the JMLR website. Whether or not this disadvantage cancels the advantage is debatable, but essentially no one working on machine learning argues with the following: the changes brought by the creation of JMLR have been positive for the general machine learning community.

This model can and should be emulated in other areas of research where publishers are not behaving in a sufficiently constructive manner. Doing so requires two vital ingredients: a consensus of leaders to support a new journal and the willigness to spend the time and effort setting it up. Presumably, some lessons on how to do this have been learned by the editors of JMLR and they are willing to share it.

(*) Back in the day, it was typical to be forced to sign over all rights to your journal paper, then ignore this and place it on your homepage. The natural act of placing your paper on your webpage is no longer illegal.

3/9/20063/9/2006

Use of Notation

For most people, a mathematical notation is like a language: you learn it and stick with it. For people doing mathematical research, however, this is not enough: they must design new notations for new problems. The design of good notation is both hard and worthwhile since a bad initial notation can retard a line of research greatly.

Before we had mathematical notation, equations were all written out in language. Since words have multiple meanings and variable precedences, long equations written out in language can be extraordinarily difficult and sometimes fundamentally ambiguous. A good representative example of this is the legalese in the tax code. Since we want greater precision and clarity, we adopt mathematical notation.

One fundamental thing to understand about mathematical notation, is that humans as logic verifiers, are barely capable. This is the fundamental reason why one notation can be much better than another. This observation is easier to miss than you might expect because, for a problem that you are working on, you have already expended the effort to reach an understanding.

I don’t know of any systematic method for designing notation, but there are a set of heuristics learned over time which may be more widely helpful.

Notation should be minimized. If there are two ways to express things, then choose the (objectively, by symbol count) simpler one. If notation is only used once, it should be removable (this often arises in presentations).
Notation divergence should be minimized. If the people working on some problem have a standard notation, then sticking with it is easier. For example, in machine learning x is almost always a set of features from which predictions are made.
A reasonable mechanism for notation design is to first name and define the quantities you are working with (for example, reward r and time t), and then make derived quantities by combination (for example r_t is reward at time t).
Variables should be alliterated. Time is t, reward is r, cost is c, hypothesis is h.
Name collisions (or near collisions) should be avoided. E and p are terrible variable names in some contexts.
Sub-sub-scripts should be avoided. It is often possible to change a sub-sub-script into a sub-script by redefinition.
Superscripts are dangerous because of overloading with exponentiation.
Inessential dependences should be suppressed in the definition. (For example, in reinforcement learning the MDP M you are working with is often suppressable because it never changes.)
A dependence must be either uniformly suppressed or uniformly explicit.
Short theorem statements are very nice. There seem to be two styles of theorem statements: long including all definitions and short with definitions made before the statement. As computer scientists, we have to prefer “short” because long is nonmodular. As humans, it’s easier to read.
It is very easy to forget the quantification of a variable (“for all” or “there exists”) when you are working on a theorem, and it is essential for readers that you specify it explicitly.
Avoid strange alphabets. It is hard for people to think with unfamiliar symbols. english lowercase > english upper case > greek lower case > greek upper case > hebrew > other strange things.
The definitions section of a paper often should not contain all the definitions in a paper. Instead, it should cover the universally used definitions. Others can be introduced just before they are used.

These heuristics often come into conflict, which can be hard to resolve. When trying to resolve the conflict, it’s important to understand that it’s easy to fail to imagine what a notation would be like. Trying out different notations and comparing is reasonable.

Are there other useful heuristics for notation design? (Or disagreements with the above heuristics?)

2/4/20062/4/2006

Research Budget Changes

The announcement of an increase in funding for basic research in the US is encouraging. There is some discussion of this at the Computing Research Policy blog.

One part of this discussion has a graph of NSF funding over time, presumably in dollar budgets. I don’t believe that dollar budgets are the right way to judge the impact of funding changes on researchers. A better way to judge seems to be in terms of dollar budget divided by GDP which provides a measure of the relative emphasis on research.

This graph was assembled by dividing the NSF budget by the US GDP. For 2005 GDP, I used the current estimate and for 2006 and 2007 assumed an increase by a factor of 1.04 per year. The 2007 number also uses the requested 2007 budget which is certain to change.

This graph makes it clear why researchers were upset: research funding emphasis has fallen for 3 years in a row. The reality has been significantly more severe due to DARPA decreasing funding and industrial research labs (ATnT and Lucent for example) laying off large numbers of researchers about when the governments emphasis on basic research started declining.

It is certainly encouraging to see the emphasis on science growing again.