The Workshop for Women in Machine Learning will be held in San Diego on October 4, 2006.
For details see the workshop website:
http://www.seas.upenn.edu/~wiml/
Machine learning and learning theory research
The Workshop for Women in Machine Learning will be held in San Diego on October 4, 2006.
For details see the workshop website:
http://www.seas.upenn.edu/~wiml/
Ed Snelson won the Predictive Uncertainty in Environmental Modelling Competition in the temp(erature) category using this algorithm. Some characteristics of the algorithm are:
This bears a strong resemblance to a neural network. The two main differences seem to be:
For learning reductions we have been concentrating on reducing various complex learning problems to binary classification. This choice needs to be actively questioned, because it was not carefully considered.
Binary clasification is learning a classifier c:X -> {0,1} so as to minimize the probability of being wrong, Prx,y~D(c(x) <> y).
The primary alternative candidate seems to be squared error regression. In squared error regression, you learn a regressor s:X -> [0,1] so as to minimize squared error, Ex,y~D (s(x)-y)2.
It is difficult to judge one primitive against another. The judgement must at least partially be made on nontheoretical grounds because (essentially) we are evaluating a choice between two axioms/assumptions.
These two primitives are significantly related. Classification can be reduced to regression in the obvious way: you use the regressor to predict D(y=1|x), then threshold at 0.5. For this simple reduction a squared error regret of r implies a classification regret of at most r0.5. Regression can also be used to reduce to classification using the Probing algorithm. (This is much more obvious when you look at an updated proof. ) Under this reduction, a classification regret of r implies a squared error regression regret of at most r.
Both primitives enjoy a significant amount of prior work with (perhaps) classification enjoying more work in the machine learning community and regression having more emphasis in the statistics community.
The (nonhistoric) reasons for preferring classification seem to be:
There are several reasons to consider regression over classification:
The mathematical evidence points toward squared error regression as a better primitive, although doubts still seem reasonable to entertain.
Many of the large machine learning conferences were in the US this summer. A common problem which students from abroad encounter is visa issues.
Just getting a visa to visit can be pretty rough: you stand around in lines, sometimes for days. Even worse is the timing with respect to ticket buying. Airplane tickets typically need to be bought well in advance on nonrefundable terms to secure a reasonable rate for air travel. When a visa is denied, as happens reasonably often, a very expensive ticket is burnt.
A serious effort is under way to raise this as in issue in need of fixing. Over the long term, effectively driving research conferences to locate outside of the US seems an unwise policy. Robert Schapire is planning to talk to a congressman. Sally Goldman suggested putting together a list of problem cases, and Phil Long setup an email address immigration.and.confs@gmail.com to collect them.
If you (or someone you know) has had insurmountable difficulties reaching a conference in the US, please send an email with:
Name:
Email address:
Conference:
Difficulty:
Details: (be brief please)
We expect most of the problem cases are students, so don’t be shy.
How should we, as researchers in machine learning, organize ourselves?
The most immediate measurable objective of computer science research is publishing a paper. The most difficult aspect of publishing a paper is having reviewers accept and recommend it for publication. The simplest mechanism for doing this is to show theoretical progress on some standard, well-known easily understood problem.
In doing this, we often fall into a local minima of the research process. The basic problem in machine learning is that it is very unclear that the mathematical model is the right one for the (or some) real problem. A good mathematical model in machine learning should have one fundamental trait: it should aid the design of effective learning algorithms. To date, our ability to solve interesting learning problems (speech recognition, machine translation, object recognition, etc…) remains limited (although improving), so the “rightness” of our models is in doubt.
If our mathematical models are bad, the simple mechanism of research above can not yield the end goal. (This should be agreed on even by people who disagree about what the end goal of machine learning is!) Instead, research which proposes and investigates new mathematical models for machine learning might yield the end goal. Doing this is hard.
The good news here is that a succesful new model can be a big win. I wish it was an easier win: the barriers to success are formidably high, and it seems we should do everything possible to lower the barriers to success for the sake of improving research.