Regret minimizing vs error limiting reductions

This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration.

  1. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate.
  2. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0. An analagous result for reget is that for a probabilistic error correcting output code, multiclass regret is at most 4 * (binary regret)0.5.

The use of “regret” is more desirable than the use of error rates, because (for example) the ECOC error rate bound implies nothing when there is enough noise so that the binary classifiers always have error rate 0.25. However the square root dependence introduced when analyzing regret is not desirable. A basic question is: Can we have the best of both worlds? Can we find some algorithm doing multiclass classification with binary classifiers such that average regret r for the binary classifiers implies average regret bounded by 4r for the multiclass classifier?

If the answer is “yes”, that reduction algorithm may be empirically superior to the one we use now.
If the answer is “no”, that is a sharp and unexpected distinction between error rate analysis and regret analysis.

NIPS

NIPS is the big winter conference of learning.

  1. Paper due date: June 3rd. (Tweaked thanks to Fei Sha.)
  2. Location: Vancouver (main program) Dec. 5-8 and Whistler (workshops) Dec 9-10, BC, Canada

NIPS is larger than all of the other learning conferences, partly because it’s the only one at that time of year. I recommend the workshops which are often quite interesting and energetic.

Math on the Web

Andrej Bauer has setup a Mathematics and Computation Blog. As a first step he has tried to address the persistent and annoying problem of math on the web. As a basic tool for precisely stating and transfering understanding of technical subjects, mathematics is very necessary. Despite this necessity, every mechanism for expressing mathematics on the web seems unnaturally clumsy. Here are some of the methods and their drawbacks:

  1. MathML This was supposed to be the answer, but it has two severe drawbacks: “Internet Explorer” doesn’t read it and the language is an example of push-XML-to-the-limit which no one would ever consider writing in. (In contrast, html is easy to write in.) It’s also very annoying that math fonts must be installed independent of the browser, even for mozilla based browsers.
  2. Create inline images. This has several big drawbacks: font size is fixed for all viewers, you can’t cut & paste inside the images, and you can’t hyperlink from (say) symbol to definition. Math World is a good example using this approach.
  3. Html Extensions. For example, yi = x2. The drawback here is that the available language is very limited (no square roots, integrals, sums, etc…). This is what I have been using for posts.
  4. Raw latex. Researchers are used to writing math in latex and compile into postscript or pdf. It is possible to simply communicate in that language. Unfortunately, the language can make simple things like fractions appear (syntactically) much more complicated. More importantly, latex is not nearly as universally known as the mathematics layed out in math books.
  5. Translation. An obvious trick is to translate this human-editable syntax into something. There are two difficulties here:
    1. What do you translate to? None of the presentations mechanisms above are fully satisfying.
    2. Lost in translation. For example in latex, it’s hard to make a hyperlink from a variable in one formula to an anchor in the variable definition of another formula and have that translated correctly into (say) MathML.

Approach (4) is what Andrej’s blog is using, with a javascript translator that changes output depending on the destination browser. Ideally, the ‘smart translator’ would use whichever of {MathML, image, html extensions, human-edit format} was best and supported by the destination browser, but that is not yet the case. Nevertheless, it is a good start.

Visa Casualties

For the Chicago 2005 machine learning summer school we are organizing, at least 5 international students can not come due to visa issues. There seem to be two aspects to visa issues:

  1. Inefficiency. The system rejected the student simply by being incapable of even starting to evaluate their visa in less than 1 month of time.
  2. Politics. Border controls became much tighter after the September 11 attack. Losing a big chunk of downtown of the largest city in a country will do that.

What I (and the students) learned is that (1) is a much larger problem than (2). Only 1 prospective student seems to have achieved an explicit visa rejection. Fixing problem (1) should be a no-brainer, because the lag time almost surely indicates overload, and overload on border controls should worry even people concerned with (2). The obvious fixes to overload are “spend more money” and “make the system more efficient”.

With respect to (2), (which is a more minor issue by the numbers) it is unclear that the political calculus was done right. There is an obvious demonstrated risk that letting the wrong people through border controls means large buildings can be destroyed. However there is a subtle risk in making acquiring a visa a more uncertain process: it contributes towards shifting science, (human) learning, and technology outside of the US. This shift is economically detrimental to the US. For some anecdotal evidence of this effect, note that this is the first machine learning summer school in the US but the 6th in the series. Less striking, but perhaps a surer measurement is to notice that many of the machine learning related summer conferences are in Europe this year.

Learning Reductions are Reductionist

This is about a fundamental motivation for the investigation of reductions in learning. It applies to many pieces of work other than my own.

The reductionist approach to problem solving is characterized by taking a problem, decomposing it into as-small-as-possible subproblems, discovering how to solve the subproblems, and then discovering how to use the solutions to the subproblems to solve larger problems. The reductionist approach to solving problems has often payed off very well. Computer science related examples of the reductionist approach include:

  1. Reducing computation to the transistor. All of our CPUs are built from transistors.
  2. Reducing rendering of images to rendering a triangle (or other simple polygons). Computers can now render near-realistic scenes in real time. The big breakthrough came from learning how to render many triangles quickly.

This approach to problem solving extends well beyond computer science. Many fields of science focus on theories making predictions about very simple systems. These predictions are then composed to make predictions about where space craft go, how large a cannonball needs to be, etc… Obviously this approach has been quite successful.

It is an open question whether or not this approach can really succeed at learning.

  1. Against: We know that succesful learning requires the incorporation of prior knowledge in fairly arbitrary forms. This suggests that we can not easily decompose the process of learning.
  2. For: We know that humans can succeed at general purpose learning. It may be that arbitrary prior knowledge is required to solve arbitrary learning problems, but perhaps there are specific learning algorithms incorporating specific prior knowledge capable of solving the specific problems we encounter.
  3. Neutral: We know that learning reductions sometimes work. We don’t yet have a good comparison of how well they work with other approaches.