Visa Casualties

For the Chicago 2005 machine learning summer school we are organizing, at least 5 international students can not come due to visa issues. There seem to be two aspects to visa issues:

  1. Inefficiency. The system rejected the student simply by being incapable of even starting to evaluate their visa in less than 1 month of time.
  2. Politics. Border controls became much tighter after the September 11 attack. Losing a big chunk of downtown of the largest city in a country will do that.

What I (and the students) learned is that (1) is a much larger problem than (2). Only 1 prospective student seems to have achieved an explicit visa rejection. Fixing problem (1) should be a no-brainer, because the lag time almost surely indicates overload, and overload on border controls should worry even people concerned with (2). The obvious fixes to overload are “spend more money” and “make the system more efficient”.

With respect to (2), (which is a more minor issue by the numbers) it is unclear that the political calculus was done right. There is an obvious demonstrated risk that letting the wrong people through border controls means large buildings can be destroyed. However there is a subtle risk in making acquiring a visa a more uncertain process: it contributes towards shifting science, (human) learning, and technology outside of the US. This shift is economically detrimental to the US. For some anecdotal evidence of this effect, note that this is the first machine learning summer school in the US but the 6th in the series. Less striking, but perhaps a surer measurement is to notice that many of the machine learning related summer conferences are in Europe this year.

Learning Reductions are Reductionist

This is about a fundamental motivation for the investigation of reductions in learning. It applies to many pieces of work other than my own.

The reductionist approach to problem solving is characterized by taking a problem, decomposing it into as-small-as-possible subproblems, discovering how to solve the subproblems, and then discovering how to use the solutions to the subproblems to solve larger problems. The reductionist approach to solving problems has often payed off very well. Computer science related examples of the reductionist approach include:

  1. Reducing computation to the transistor. All of our CPUs are built from transistors.
  2. Reducing rendering of images to rendering a triangle (or other simple polygons). Computers can now render near-realistic scenes in real time. The big breakthrough came from learning how to render many triangles quickly.

This approach to problem solving extends well beyond computer science. Many fields of science focus on theories making predictions about very simple systems. These predictions are then composed to make predictions about where space craft go, how large a cannonball needs to be, etc… Obviously this approach has been quite successful.

It is an open question whether or not this approach can really succeed at learning.

  1. Against: We know that succesful learning requires the incorporation of prior knowledge in fairly arbitrary forms. This suggests that we can not easily decompose the process of learning.
  2. For: We know that humans can succeed at general purpose learning. It may be that arbitrary prior knowledge is required to solve arbitrary learning problems, but perhaps there are specific learning algorithms incorporating specific prior knowledge capable of solving the specific problems we encounter.
  3. Neutral: We know that learning reductions sometimes work. We don’t yet have a good comparison of how well they work with other approaches.

Don’t mix the solution into the problem

A common defect of many pieces of research is defining the problem in terms of the solution. Here are some examples in learning:

  1. “The learning problem is finding a good seperating hyperplane.”
  2. “The goal of learning is to minimize (y-p)2 + C w2 where y = the observation, p = the prediction and w = a parameter vector.”
  3. Defining the loss function to be the one that your algorithm optimizes rather than the one imposed by the world.

The fundamental reason why this is a defect is that it creates artificial boundaries to problem solution. Artificial boundaries lead to the possibility of being blind-sided. For example, someone committing (1) or (2) above might find themselves themselves surprised to find a decision tree working well on a problem. Example (3) might result in someone else solving a learning problem better for real world purposes, even if it’s worse with respect to the algorithm optimization. This defect should be avoided so as to not artificially limit your learning kungfu.

The way to avoid this defect while still limiting the scope of investigation to something you can manage is to be explicit.

  1. Admit what the real world-imposed learning problem is. For example “The problem is to find a classifier minimizing error rate”.
  2. Be explicit about where the problem ends and the solution begins. For example “We use a support vector machine with a l2 loss to train a classifier. We use the l2 loss because it is an upper bound on the error rate which is computationally tractable to optimize.”
  3. Finish the solution. For example “The error rate on our test set was 0.34.”

It is important to note that this is not a critique about any particular method for solving learning problems, but rather about the process of thinking about learning problems. Eliminating this thinking-bug will leave people more capable of appreciating and using different approaches to solve the real learning problem.

Conference attendance is mandatory

For anyone planning to do research, conference attendance is virtually mandatory for success. Aside from exposing yourself to a large collection of different ideas, many interesting conversations leading to new research happen at conferences. If you are a student, you should plan to go to at least one summer conference. Your advisor should cover the costs.

Conference Location Early Registration deadline normal/student cost in US dollars
AAAI Pittsburgh, PA, USA May 13 590/170
IJCAI Edinburgh, Scotland May 21 663/351
COLT Bertinoro, Italy May 30 256/178
KDD Chicago, IL, USA July 15 590/260
ICML Bonn, Germany July 1 448
UAI Edinburgh, Scotland not ready yet ???

Reviewing techniques for conferences

The many reviews following the many paper deadlines are just about over. AAAI and ICML in particular were experimenting with several reviewing techniques.

  1. Double Blind: AAAI and ICML were both double blind this year. It seemed (overall) beneficial, but two problems arose.
    1. For theoretical papers, with a lot to say, authors often leave out the proofs. This is very hard to cope with under a double blind review because (1) you can not trust the authors got the proof right but (2) a blanket “reject” hits many probably-good papers. Perhaps authors should more strongly favor proof-complete papers sent to double blind conferences.
    2. On the author side, double blind reviewing is actually somewhat disruptive to research. In particular, it discourages the author from talking about the subject, which is one of the mechanisms of research. This is not a great drawback, but it is one not previously appreciated.
  2. Author feedback: AAAI and ICML did author feedback this year. It seemed helpful for several papers. The ICML-style author feedback (more space, no requirement of attacking the review to respond), appeared somewhat more helpful and natural. It seems ok to pass a compliment from author to reviewer.
  3. Discussion Periods: AAAI seemed more natural than ICML with respect to discussion periods. For ICML, there were “dead times” when reviews were submitted but discussions amongst reviewers were not encouraged. This has the drawback of letting people forget their review before discussing it.