Machine Learning (Theory) – Page 100 – Machine learning and learning theory research

4/4/20054/4/2005

Grounds for Rejection

It’s reviewing season right now, so I thought I would list (at a high level) the sorts of problems which I see in papers. Hopefully, this will help us all write better papers.

The following flaws are fatal to any paper:

Incorrect theorem or lemma statements A typo might be “ok”, if it can be understood. Any theorem or lemma which indicates an incorrect understanding of reality must be rejected. Not doing so would severely harm the integrity of the conference. A paper rejected for this reason must be fixed.
Lack of Understanding If a paper is understood by none of the (typically 3) reviewers then it must be rejected for the same reason. This is more controversial than it sounds because there are some people who maximize paper complexity in the hope of impressing the reviewer. The tactic sometimes succeeds with some reviewers (but not with me).
As a reviewer, I sometimes get lost for stupid reasons. This is why an anonymized communication channel with the author can be very helpful.
Bad idea Rarely, a paper comes along with an obviously bad idea. These also must be rejected for the integrity of science

The following flaws have a strong negative impact on my opinion of the paper.

Kneecapping the Giants. “Kneecapping the giants” papers take a previously published idea, cripple it, and then come up with an improvement on the crippled version. This often looks great experimentally, but is unconvincing because it does not improve on the state of the art.
Only Toys. The paper emphasizes experimental evidence on datasets specially created to show the good performance of their algorithm. Unfortunately, because learning is worst-case-impossible, I have little trust that performing well on a toy dataset implies good performance on real-world datasets.

My actual standard for reviewing is quite low, and I’m happy to approve of incremental improvements. Unfortunately, even that standard is such that I suggest rejection on most reviewed papers.

4/1/20054/1/2005

Basic computer science research takes a hit

The New York Times has an interesting article about how DARPA has dropped funding for computer science to universities by about a factor of 2 over the last 5 years and become less directed towards basic research. Partially in response, the number of grant submissions to NSF has grown by a factor of 3 (with the NSF budget staying approximately constant in the interim).

This is the sort of policy decision which may make sense for the defense department, but which means a large hit for basic research on information technology development in the US. For example “darpa funded the invention of the internet” is reasonably correct. This policy decision is particularly painful in the context of NSF budget cuts and the end of extensive phone monopoly funded research at Bell labs.

The good news from a learning perspective is that (based on anecdotal evidence) much of the remaining funding is aimed at learning and learning-related fields. Methods of making good automated predictions obviously have a lot of applications that DARPA cares about and the technology often isn’t there yet.

4/1/20054/1/2005

The Producer-Consumer Model of Research

In the quest to understand what good reviewing is, perhaps it’s worthwhile to think about what good research is. One way to think about good research is in terms of a producer/consumer model.

In the producer/consumer model of research, for any element of research there are producers (authors and coauthors of papers, for example) and consumers (people who use the papers to make new papers or code solving problems). An produced bit of research is judged as “good” if it is used by many consumers. There are two basic questions which immediately arise:

Is this a good model of research?
Are there alternatives?

The producer/consumer model has some difficulties which can be (partially) addressed.

Disconnect. A group of people doing research on some subject may become disconnected from the rest of the world. Each person uses the research of other people in the group so it appears good research is being done, but the group has no impact on the rest of the world. One way to detect this is by looking at the consumers² (the consumers of the consumers) and higher order powers. If the set doesn’t expand much with higher order powers, then there is a disconnect.
Latency. It is extraordinarily difficult to determine in advance whether a piece of research will have many consumers. A particular piece of work may be useful only after a very long period of time. This difficulty is particularly severe for theoretical research.
Self-fulfillment To some extent, interesting research by this definition is simply research presented to the largest possible audience. The odds that someone will build on the research are simply larger when it is presented to a larger audience. Some portion of this effect is “ok”—certainly attempting to educate other people is a good idea. But in judging the value of a piece of research, discounting by the vigor with which it is presented may be healthy for the system. (In effect, this as a bias against spamming.)

If we accept the difficulties of the producer consumer model, then good reviewing becomes a problem of predicting what research will have a large impact in terms of the numbers of consumers (and consumers^2, etc…) Citations can act (to some extent) as a proxy for consumption implying that it may be possible to (retroactively) score a reviewer’s judgement. There are many difficulties here. For example a citation of the form “[joe blow 93] is wrong and here’s why” isn’t an example of the sort of use we want to encourage. Another important effect is that a reviewer who rejects a paper biases the number of citations a paper later recieves. Another is that a rejected paper that has been resubmitted to another place may change so that it is simply a better paper. It isn’t obvious what a good method is for taking all of these effects into account.

Clearly, there are problems with this model for judging research (and at the second order, judgements of reviews of research). However, I am not aware of any other abstract model for “good research” which is even this good. If you know one, please comment.

3/30/20053/30/2005

What can Type Theory teach us about Machine Learning?

This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory.

Types and Reductions

John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probably wildly wrong.)

g’: List of (gamma,delta) -> (List of (alpha,beta) -> (alpha -> beta)) -> (gamma -> delta)

Perhaps another is to think of the reduction itself type-theoretically as a combinator that takes something typed like g above and spits out a new classifier trainer. (I.e. a reduction “lifts” a lower-level function up to operate on some type it wasn’t originally designed for.)

Many of things John considers reductions have particularly error and sample complexity preserving properties. For instance, a reduction may have the property that small regret of a low level classifier implies small regret for the result of the reduction. There are a number of interesting issues here:

* Can we “type” these properties so a compiler might check them?
* Is this enough? Some reductions (including, I would argue, some that have been discussed seriously) are “trivial” reductions in the sense that the antecedent never holds. That is, the are combinators that setup learning problems impossible to solve. Can we somehow type this kind of thing so we can separate good from bad reductions, where we try to define bad as meaning something like “creates impossible subproblems where it was possible to do otherwise”?

RLBench hints at the power that formalizing interfaces for things like reductions can be. You can chain together powerful comands that leverage existing classifier trainers to learn in a trace-generating model which is adapted (again essentially a combinator library) from a generative model. Unfortunately, RLBench operates at the command line which is poorly designed for this kind of composition.

Types and Probabilistic Models

Arguably the graphical models/Bayesian community has just as grandiose plans as John, but here the reductions are to learning probabilities in the sense “beliefs”. Graphical models form a language that allows us to compose together little subgraphs that express our beliefs about some subsystem. Much research has gone into allowing the modeling language to drive inference as well. It’s interesting to explore what type theory can tell us about this as well.

Existing work

Lloyd Allison has taken thoughts like this quite seriously in his relatively recent work, “Types and Classes in Machine Learning and Data Mining” . Allison uses Haskell to type a number of ideas including
MML and some classifiers, and can check many things statically. Unfortunately, ML is one of those parts of information science where we demand every computational advantage we can get. If we learn to solve some problem with acceptable speed, in no time it is the inner loop of some more sophisticated machine learning algorithm. Despite my apppreciation (well, mostly) of Haskell, it simply isn’t practical to write ML algorithms in Haskell. (Although
presumambly both Allison and …. would disagree with me.)

It may still make a huge amount of sense to think about things in something like the Haskell type system and then translate them to the capable (but gross) type system of, say C++. Understanding polymorphism and type classes and there relation with Machine Learning may be a real fundamental breakthrough to making ML widely useful.

Contributions flowing towards PL theory

Right now, when push comes to shove, all good interfaces between systems basically amount to invoking functions or closures. When
you get over “object oriented” and other such hype this makes sense for structuring work. What’s interesting is that recent machine learning and AI work *can’t* be expressed that way. I think of graphical models as tools for expressing future interfaces because they preserve uncertainty across boundaries. This seems to me where ML people can challenge PL people– help us design a new conception of “interface” that preserves uncertainty between layers. (Say, that
passes probability or “belief messages” back and forth.) Perhaps the probabalistic machinery already exist: we can always define “sampling interfaces” between systems. My guess is that these interfaces are basically multi-directional (unlike functional interfaces). Why? Say I want a program to understand speech and I build a layered system that consists of “signaling processing”, “phoneme recognition”, “word recognition”, “language modeling”, and some form of “semantic understanding”. I can resolve something ambigious about, say a phoneme, by understanding the higher level language model. To get the phoneme parsing right I have to feedback the results from language layer. In this sense, interfaces need to preserve uncertainty and probably pass information both ways.

How to start (small)

I think a serious effort should be made to explain things like reductions in terms of PL theory– even if it means that, like Allison, you have to do some explaining type systems first. I’d love to spend some time hashing some of this out with PL people. (If only there were more hours in the day!)

We should write our libraries to have super-clean functional interfaces and make them as (parametrically) polymorphic as reasonable.

Any other thoughts on ways to proceed on this in the short term?

3/29/20053/29/2005

Academic Mechanism Design

From game theory, there is a notion of “mechanism design”: setting up the structure of the world so that participants have some incentive to do sane things (rather than obviously counterproductive things). Application of this principle to academic research may be fruitful.

What is misdesigned about academic research?

The JMLG guides give many hints.
The common nature of bad reviewing also suggests the system isn’t working optimally.
There are many ways to experimentally “cheat” in machine learning.
Funding Prisoner’s Delimma. Good researchers often write grant proposals for funding rather than doing research. Since the pool of grant money is finite, this means that grant proposals are often rejected, implying that more must be written. This is essentially a “prisoner’s delimma”: anyone not writing grant proposals loses, but the entire process of doing research is slowed by distraction. If everyone wrote 1/2 as many grant proposals, roughly the same distribution of funding would occur, and time would be freed for more research.

Mechanism design is not that easy—many counterintuitive effects can occur. Academic mechanism design is particularly difficult problem because there are many details. Nevertheless, it may be worthwhile because it’s hard to underestimate the value of an improvement in the rate of useful research.

The good news is that not everything needs to be solved at once. For example, on the empirical side, if we setup an easy system allowing anyone to create challenges like KDDCup, we might achieve a better (i.e. less cheat-prone) understanding of what works and what does not.