Families of Learning Theory Statements

The diagram above shows a very broad viewpoint of learning theory.

arrow Typical statement Examples
Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. Weighted Majority
Past->Future Assuming independent samples, past performance predicts future performance. PAC analysis, ERM analysis
Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A. ECOC, Probing

A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?

Is the Goal Understanding or Prediction?

Steve Smale and I have a debate about goals of learning theory.

Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1], you can state a theorem about the error rate dependent on the variance, E(x,y)~D (y-Ey’~D|x[y’])2.

I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm.

Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science”. Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc…

“Operative” mathematics helps you make predictions about the world. A strong example of operative mathematics is Newtonian mechanics in physics: it’s a great tool to help you predict what is going to happen in the world.

In addition to the “I want to do things” motivation for operative mathematics, I find it less arbitrary. In particular, two reasonable people can each be convinced they understand a topic in ways so different that they do not understand the viewpoint. If these understandings are operative, the rest of us on the sidelines can better appreciate which understanding is “best”.

Fast SVMs

There was a presentation at snowbird about parallelized support vector machines. In many cases, people parallelize by ignoring serial operations, but that is not what happened here—they parallelize with optimizations. Consequently, this seems to be the fastest SVM in existence.

There is a related paper here.

Structured Regret Minimization

Geoff Gordon made an interesting presentation at the snowbird learning workshop discussing the use of no-regret algorithms for the use of several robot-related learning problems. There seems to be a draft here. This seems interesting in two ways:

  1. Drawback Removal One of the significant problems with these online algorithms is that they can’t cope with structure very easily. This drawback is addressed for certain structures.
  2. Experiments One criticism of such algorithms is that they are too “worst case”. Several experiments suggest that protecting yourself against this worst case does not necessarily incur a great loss.

Grounds for Rejection

It’s reviewing season right now, so I thought I would list (at a high level) the sorts of problems which I see in papers. Hopefully, this will help us all write better papers.

The following flaws are fatal to any paper:

  1. Incorrect theorem or lemma statements A typo might be “ok”, if it can be understood. Any theorem or lemma which indicates an incorrect understanding of reality must be rejected. Not doing so would severely harm the integrity of the conference. A paper rejected for this reason must be fixed.
  2. Lack of Understanding If a paper is understood by none of the (typically 3) reviewers then it must be rejected for the same reason. This is more controversial than it sounds because there are some people who maximize paper complexity in the hope of impressing the reviewer. The tactic sometimes succeeds with some reviewers (but not with me).

    As a reviewer, I sometimes get lost for stupid reasons. This is why an anonymized communication channel with the author can be very helpful.

  3. Bad idea Rarely, a paper comes along with an obviously bad idea. These also must be rejected for the integrity of science

The following flaws have a strong negative impact on my opinion of the paper.

  1. Kneecapping the Giants. “Kneecapping the giants” papers take a previously published idea, cripple it, and then come up with an improvement on the crippled version. This often looks great experimentally, but is unconvincing because it does not improve on the state of the art.
  2. Only Toys. The paper emphasizes experimental evidence on datasets specially created to show the good performance of their algorithm. Unfortunately, because learning is worst-case-impossible, I have little trust that performing well on a toy dataset implies good performance on real-world datasets.

My actual standard for reviewing is quite low, and I’m happy to approve of incremental improvements. Unfortunately, even that standard is such that I suggest rejection on most reviewed papers.