An ICML reject

Hal, Daniel, and I have been working on the algorithm Searn for structured prediction. This was just conditionally accepted and then rejected from ICML, and we were quite surprised. By any reasonable criteria, it seems this is an interesting algorithm.

Prediction Performance: Searn performed better than any other algorithm on all the problems we tested against using the same feature set. This is true even using the numbers reported by authors in their papers.
Theoretical underpinning. Searn is a reduction which comes with a reduction guarantee: the good performance on a base classifiers implies good performance for the overall system. No other theorem of this type has been made for other structured prediction algorithms, as far as we know.
Speed. Searn has no problem handling much larger datasets than other algorithms we tested against.
Simplicity. Given code for a binary classifier and a problem-specific search algorithm, only a few tens of lines are necessary to implement Searn.
Generality. Searn applies in a superset of the situations where other algorithms apply. It can use (and cope with) arbitrary loss functions over the data. It can also solve new problems not previously thought of as learning problems (and we do so!)

Much of the time, papers are about tradeoffs. A very typical (although often unstated) tradeoff is expending extra computation to gain better predictive performance in practice. In Searn, it seems there is no tradeoff compared to other approaches: you can have your lunch and eat it too.

In addition, this also solves a problem: yes, any classifier can be effectively and efficiently applied on complex structured prediction problems via Searn.

Why reject?
We, as respectfully as possible, simply disagree with the SPC about the stated grounds for rejection.

The paper is difficult to read. This is true to some extent, but doesn’t seem reasonable. Fundamentally, there is a lot going on in terms of new ways of thinking about the problem and that simply requires some patience to read. One endorsement of this comes from a reviewer who said

In the end, I do think there is something there, but I think its introduction should have been like that for CRF. Present the idea, with an understanding of why it should work, with one or two easily replicable examples. Then over a few papers reapplying it to new settings, people would get it.

It isn’t often that a reviewer complains that there is too much content so it should be split across multiple papers.
The SPC stated:

The results, though, which essentially show that good local classifiers imply good global performance, are not that significant, and hold for other approaches that use local classifiers as building blocks. After all, perfect local classifiers provide perfect local accuracy, and therefore provide perfect global accuracy, and again provide perfect global loss of any kind.

Both sentences are simply false in the setting we consider. In particular, no other algorithms appear to have a good local performance to global performance guarantee for general global loss functions. Furthermore, it is not the case that perfect local performance implies perfect global performance except (perhaps) in a noise free world. Most of us believe that the problems we address typically contain fundamental ambiguities and noise (that was certainly our mathematical model). It’s easy to setup a (noisy) distribution over inputs+loss such that best-possible-up-to-the-noise-limit local predictors are globally suboptimal.
The SPC wanted us to contrast with Michael Collins, Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, EMNLP02. We instead contrasted with Michael Collins and Brian Roark, Incremental Parsing with the Perceptron Algorithm, ACL04. I believe any reasonable reading of these and the Searn paper will find the ACL04 paper superceeds the EMNLP02 paper in relevance.
The SPC wanted us to contrast with IBT in V. Punyakanok, D. Roth, W. Yih, and D. Zimak, Learning and Inference over Constrained Output. IJCAI 2005. Here we messed up and confused it with V. Punyakanok and D. Roth, The Use of Classifiers in Sequential Inference. NIPS 2001 (which we did compare with). IBT is a modification of the earlier algorithm which integrates global information (in the form of problem specific constraints) into the local training process yielding performance gains. This modification addresses an orthogonal issue: Searn has not been used in any experiments yet where this side information is available. Searn is made to cope with global loss functions rather than global constraints.

These reasons for rejection seem (1) weak, (2) false, (3) very weak, and (4) weak.

Why post this? There are several reasons.

The biggest personal reason is that we want to build on Searn. For whatever reason, it is psychologically difficult to build on rejected work. Posting it here gives us some sense that it is “out there” which may make it easier to build on it. We are trying to do research and simply don’t want to be delayed for 6 months or a year.
One of the purposes of publishing a paper is to acquire feedback about the research, and there is some opportunity here.
This is a nice algorithm for which we can easily imagine immediate use and general interest. Hal will be releasing the source code shortly which should make it very easy to use.
Communal mental hygiene. It should not be the case, under any circumstances, that an (S)PC/reviewer makes false statements. If you think something is true but aren’t sure, it is appropriate to say “I think …” rather than simply asserting it as a fact. Asserting nonfact as fact is injurious to the process of research because it distorts the understanding of other people.

We aren’t looking for a flamefest. If there are no comments or simply comments about Searn, that’s fine. We aren’t looking for a reversal of the decision. A final decision has to be made for any conference and for better or worse this is it. The PC chairs should not be pestered—they are very busy people. Running ICML is a very difficult task, and we understand that some mistakes are inevitable when there are hundreds of papers involved.

This post violates a standard: avoiding talking about specific papers the poster has been working on. To remedy this, I will consider posts on other rejected papers. Any such post must have a very strong case.

24 Replies to “An ICML reject”

Shane Legg says:

5/6/2006 at 4:47 am

I am surprised at your surprise. In other words, at least in my experience, acceptance or rejection is rather random at times.

One thing that bugs me is when I get rejected for a reason that is just plain factually wrong. In once case I had a very simple algorithm (one sentence description, five lines of code) rejected because it was claimed to be exactly the same as something someone else had already done. But if you thought about the two and just considered a few examples, their behaviour is often the complete opposite. And this was for a top journal in the field. Thankfully I complained and the editor reversed the decision. But to see the review process for a top journal initially fail in such a way was disturbing.

In another case recently I had a paper almost rejected because it apparently was not novel as somebody else had done the same thing before. My paper was about formally defining intelligence, while this other paper cited by the reviewer did not even mention the word “intelligence”. To add to the silliness, this other paper was published 2 weeks before the submission deadline for the conference I had submitted to.

I think the real problem is that people are trying to publish so much material each year that the review process is suffering. If we had less pressure to publish, not only would papers be better, but the review process would function better too due to the lighter load.
jl says:

5/6/2006 at 8:45 am

I think the story line “paper which ultimately turns out to be highly regarded is initially rejected” is fairly common.

However, conditional accept-then-reject is relatively rare. We strongly addressed point (1) in the rewrite and the punchline of point (3) was provided. Point (4) was an oversight on our part. Point (2) seems inappropriate at this stage—beliefs of this sort should control the initial accept/reject decision, hopefully in a manner which allows the authors to use their response to point out factual errors.

Just plain factually wrong does seem both particularly painful (for the author) and avoidable (by the reviewer).

In the complaint here, we should not lose sight of the fact that reviewing is fundamentally hard. A mistake was made (we believe), but it is only one mistake in a difficult process.
Aaron Hertzmann says:

5/6/2006 at 10:54 am

I’ve often heard authors complaining about papers being rejected. But then I can remember at least one situation where I was then later in the position of deciding whether that paper should be rejected, and, knowing the authors, I knew how they felt about the paper and what they would say if it were rejected again. It really makes you think twice about the paper. Similarly, I had one paper rejected in which my coauthors and I were very unhappy about it, and we complained bitterly about the shortsightedness of the committee to anyone who would listen and we gave a number of talks about the work, and the paper was accepted the following year (we also improved the paper significantly in the meantime). (For most papers I’ve had rejected, I thought that the rejection was a reasonable decision at the time).

My point is this: I bet some PC members of the next conference (say, NIPS) read this blog and will see this post. Having seen this post, they will probably be a lot more careful in making a decision; perhaps they are already “primed” because they’re aware of the major issues. So the paper will get much more favorable consideration that it would have otherwise.

I don’t have a conclusion to make—I don’t have an objection to this blog post. Just thought I’d mention. It just strikes me as weird when I notice myself being more careful for a paper on account of who the authors are (which is a more general phenomenon). Ideally, every paper would get this level of consideration of “how will the authors react? will they justifiable be upset?”

Conditional-accept-then-reject is bizarre. I’m not familiar with ICML’s process, but, I’ve only heard of conditional accept (at SIGGRAPH) as being like a Major Revision: if you satisfy these basic requirements on the writing/presentation, and the paper will be published. A conditional-accept process that is conditioned on technical details is a really bad idea, e.g.: “We don’t fully get the paper/we can’t make up our minds on it, so let’s conditionally accept it and put off the decision to later.”
Fernando Pereira says:

5/6/2006 at 1:39 pm

As you know, I’m interested in this work and I was looking forward reading the paper. I got the unpublished version from Hal’s site and we’ll go over it at my next group meeting and send you comments after that. I blogged on some of the same reviewing issues while ago. In summary, the supply of expert and willing reviewers is being overwhelmed by the rapid growth in reviewing demand because of the growth of the field. As for the ICML process, I never quite understod their conditional accept policy, especially given that authors can reply to the initial reviews. It was hard to know what advice to give to a student who had a conditional accept, although fortunately the paper was accepted in the end. Regarding some of the specific issues and your comments, I now see paper writing as very similar to proposal writing. If the critical “what”, “why”, and “how” are not clear in the first page, the odds of acceptance are much lower. It’s not that reviewers are lazy or stupid (although some may be), it is simply that the conference paper form requires leaving out technical details. If the reviewer stumbles over the missing details and starts doubting the work, it’s curtains. So, a certain sleight of hand (benign, one hopes) is needed to ease reviewers over those missing details. This is harder with ideas that go against current belief, for which common ground may be lacking. I found that quote constrasting with our original CRF paper unintentionally funny. We wrote that paper in a big hurry, and I never thought of it as a model of the paper writing craft. I doubt it would have been accepted to ICML 2006 given the increased competition. It’s like selective college admissions in the US. Admissions people need some criteria, even if we know that they are weakly correlated with ultimate success. Finally, do keep working on Searn. It’s an intriguing idea that deserves being studied and tried out by others.
Anonymous says:

5/7/2006 at 2:52 pm

an interesting way to get back on reviewers – blogging…
In fact this supports ICML’s blind submission policy. An uncareful review can hurt a popular blogger. Nice.
Anonymous says:

5/7/2006 at 3:05 pm

Looked trought the paper now. I think I’d reject it too.
hal says:

5/7/2006 at 3:33 pm

It’s of course completely valid to believe the paper should be rejected. Would you care to elaborate on why? Do you agree with the arguments John described or do you feel that it fails in other ways? We’d like to improve it as much as possible, which is part of the reason for John’s post :).
Balaji Krishnapuram says:

5/7/2006 at 8:19 pm

I have seen reviews (both for my submissions, and from other co-reviewers when I was a reviewer) that just dont stand up to a minimum required standard. For example I have seen one line reviews, or seen papers uniformly praised but then rejected for no apparent reason etc. As others have pointed out, there is a certain randomness about the quality of the review process, and recently this seems to be increasing with the growth in the number of submissions.

In this context, I have a suggestion that may seem a little unconventional in machine learning, but still has precendents from other communities (eg discussions of others’ papers in stats journals).

Why not simply make the list of authors and reviewers for a paper public? Since their name and reputation is at stake in the new age of blogging, this will increase the pressure on the reviewers to (a) be precise/clear while making comments, (b) ensure that their “facts” and claims about the paper not being novel actually hold up , and (b) be polite, though firm when rejecting papers.

I think the reviewer should at a minimum describe his understanding of the paper, and a few points that he either liked or disliked about the work. Beyond that it would be good to set it in context against the background of other work in the field, and point out the similarity to other work or what else might be interesting to experimentally validate the work etc. The review should also provide some constructive feedback to help the authors improve their work, though this ideal is not always met. If we are busy and cant do at least this much, then we have no buisiness signing up to review papers. The shortage of reviewers does not mean we commit ourselves to reviewing papers knowing that we will do a shoddy job!

Also, last year at ICML I saw some scathing and absolutely arrogant comments from a co-reviewer who was a really big name in the community. If the reviewers’ names were made public, the responsible individual would surely have thought about the reprecussions, and may even have couched the exact same criticisms in more polite language which does not *insult* the poor authors even if he felt the work was opaque or difficult to understand.

Another well known co-reviewer went on a general rampage comparing and contrasting the quality of NIPS and ICML submissions in the middle of a review for a submission (I felt it was inappropriate and not directly pertinent to discuss this while reviewing the specific paper in question). If these blanket comparisons and rants about conferences need to be made, at least such remarks should be only sent to the PC section not to the eyes of the authors as review comments about their paper.

The cloak of anonymity is being abused by such people, and it is high time we addressed this with some reasonably well thought out changes to the system.
fourr says:

5/8/2006 at 3:49 am

I like the idea of the post. Thinking of how to make it work for everybody, I think I’d like on each conference website to see a link to the whole pool of submissions, with all the reviews. Without names (double-blind), but with a discussion forum.
Serge Kosinov says:

5/8/2006 at 5:35 am

I feel the high quality conference submissions have suffered the same fiasco as the valuable web content did: lots of interesting high quality web pages (read “great conference submissions”) have been overwhelmed and largely replaced by revenue-driven click farm harvesting useless websites (read “cut-n-paste stuff by deadline-driven phd students forced to uphold X papers/year pubication rate no matter what in order not to lose financial support”). With this staggering ratio of junk versus decent quality work, an reviewer would do just fine rejecting 99% without even bothering to use her/his brain – and she/he will be correct in most of the cases 🙂
Fernando Pereira says:

5/8/2006 at 8:09 pm

Pretty sweeping claims. I’ve been reviewing for many different conferences for over 20 years, but I’ve not seen any clear trend on the quality of submissions.
Drew Bagnell says:

5/10/2006 at 10:00 am

I’ve only been reviewing for 5 or 6, but to me the quality has dramatically improved at ML conferences.
Anonymous says:

5/12/2006 at 5:37 pm

I think you post is great. It seems to be leading to a refreshing dialog about the reviewing process. However, you should post the reviews! As far as I can tell you are clearly a disgruntled author. We should be careful to criticize the reviewing process too harshly without actually seeing what the reviews say. Were the reviewers simply neglectful of their duties or do they just raise points you disagree with? Certainly we should not review your concerns without all of the pertinent information. (Unnecessary scathing criticism now: Certainly, if you presented your algorithm in the submission with such a biased perspective, it should have been rejected anyway :P. ) But seriously, it would be nice to see the reviews before concluding that the whole process doesn’t work anymore.
Anonymous says:

5/13/2006 at 5:35 am

Something is not adding up.. For this algorithm to work, h^(1) must be performing worse than h^(0), h^(2) worse than h^(1), and so on, so that paths different from the optimal path will be found. But at the end h^(C/beta) performs well, and you prove it? Really fishy. How can that be?
hal says:

5/13/2006 at 1:02 pm

Things get worse over time (moving away from optimal). The hope is that we don’t move too far in each step. That is, we take a step size that is small enough that h^1 is not “much” worse than h^0 and so on. Also, note that h^0 is only necessarily better than h^1 in that h^0 uses the optimal policy more frequently. If you compare (h^0 – optimal policy) with (h^1 – optimal policy) with (h^2 – optimal policy), you see (in practice) that things get *better* over time. This is what we want, in practice.
Anonymous says:

5/13/2006 at 4:24 pm

OK, each step is small. But then you are not explaining how (h^t – optimal policy) gets better as t increases. That’s not good. And I can’t tell what (h^t – optimal policy) is, how it’s constructed, etc. That’s bad.

The major part of the problem is that I cannot decipher how you construct h at all. I don’t know what h^0, h^t, (h^0 – optimal policy), (h^t – optimal policy) are. line 1, 8, 9 in the figure 1 is not obvious.
jl says:

5/13/2006 at 8:42 pm

Learning theory has some limitations. In particular, I’ve never seen a case where all the pessimism of a worst-case theorem was realized (and this is a very worst-case theorem in some sense because it doesn’t rely upon even an IID assumption).

Line 1 is essentially a definition: Searn takes as input an optimal policy and we define h^0 to be that.

Line 8 is using what may be an unfamiliar abstraction: the ability to solve a cost sensitive learning problem. For a precise description of how we do this, you should read the weighted all pairs paper which reduces this to binary classification. In the experiments, we note which binary classifier is used at the base of the reduction.

Line 9 is just stochastic mixing: A new policy is defined as a 1-beta chance of using the old one + a beta chance of using the new one.

(Thanks for you comments, it’s certainly helpful.)
jl says:

5/13/2006 at 9:22 pm

I don’t want to post the full reviews because (to be frank) we don’t really have a beef with the reviewers, who were clearly at least paying attention and often made reasonable suggestions.

You may be unfamiliar with the ICML review process. It’s double blind (authors don’t know reviewers and reviewers don’t know authors) with author feedback (to initial reviewer comments) and conditional accepts (“We accept, but would like you to make the following changes”). This paper was conditionally accepted, and we made extensive changes to improved it, then it was rejected (which is rare, especially when the authors make a best-effort attempt to address the conditions) by the SPC. The grounds given for rejection are fully stated above—I was careful to not leave out any of them. We simply disagree strongly that the stated grounds are reasonable.

I don’t expect this event to dramatically alter the way people do reviewing. People are conservative about changes to the reviewing process, and there are fairly solid reasons to be careful. Furthermore, people in machine learning are just now getting used to the implications of double blind (rather than single blind==author doesn’t know reviewer) reviewing, conditional accept, and author feedback all of which have been recently (or are now) being introduced at conferences. My impression is that these have helped significantly, but this event provides a datapoint suggesting there are still some problems.

If I had to pick the “next change” in the orderly progression, it wouldn’t be to reviewing. Conferences are shifting towards publishing papers online which means it is trivial to setup a discussion site for each paper. My impression is that public discussion of papers would be generally helpful to the research process.
Surendra Singhi says:

5/20/2006 at 2:49 am

Hi,
If the reviewers had asked you to contrast your paper with others, and you don’t dispute that the contrast was useless, then it generally means that you didn’t do proper prior research or survey of the field. Do it before you submit it elsewhere. Most people believe (even I do somewhat) that it is difficult to produce quality research unless you look at what other people have already done.

The point 2 of the SPC was made in a general setting (for other scenarios). I see nothing wrong in it, and rather feel that you are putting words in his mouth. I think he is generally criticizing the lack of novelty in this particular approach.

Everyone will agree, that reject after conditional accept is rare, but this might mean that the SPC was the only person who had some knowledge of this particular area, and he didn’t feel that you have done enough for the paper to be accepted.

Sorry, this may be strong, but there is no use whining.
Surendra Singhi says:

5/20/2006 at 2:57 am

It just strikes me as weird when I notice myself being more careful for a paper on account of who the authors are (which is a more general phenomenon).

For the same reason, ICML has inroduced double blind review process. But unfortunately some authors make their work available on the Internet even before it is published.

My point is this: I bet some PC members of the next conference (say, NIPS) read this blog and will see this post. Having seen this post, they will probably be a lot more careful in making a decision;

If this happens, then it is unfortunate. Only the merit of the paper should see it through and not other factors.
Anonymous says:

6/12/2006 at 11:21 pm

It is poor form to post about a rejected paper. Partly the reason for this is that: i) you as a moderator should uphold certain standards, including not airing your own personal gripes under common somewhat circumstances (most of us have rejected papers where we strongly disagreed with the reasons). This is a conflict of interest between the stated goals of the blog and your personal goals. This airing detracts from the blog, imho. ii) Your discussion regarding your rejected papers is highly biased. People are only viewing your snippets on the full reviews, so they don’t understand the full issues involved. The reviews may be bad, but don’t ask us to agree with you if you aren’t going to honestly post their opinions. You do the reviewers injustice by paraphrasing them. iii) Also, you state a “remedy” for your actions is to allow others to post on their rejections. I also think this is a very bad idea, related to point ii. A post on such a sensitive issue is hardly going to be fair, unless you start posting the full reviews. Posting full reviews also seems like a bad idea as well. This will tend to focust the discussions on wether or not the paper at hand should have been rejected. All papers desreve some forum to be heard — the rejection decsion should ultimately come from a reader, not through a high variance process of three overloaded reviewers. iv) If you want to inform us about your algoirthm, then post something postive about the algorithm, not about the rejection.

I guess my main beef is that rejections are a very touchy issue and there is a lot of personal feelings involved, which get in the way of a good academic discussion. Hence, I would prefer not to see future posts on them, unless perhaps there is a real grievance — say some dishonesty is involved)in which case, as a last resort this must get aired.
jl says:

6/15/2006 at 6:27 pm

It seems a problem for several readers of this post (including the previous anonymous commenter) is thinking of the story “reviewers didn’t like a paper and rejected it”. This is not the story. Instead, the story is “SPC rejected the paper for weak and false reasons”. I apologize if this was unclear—it’s hard to understand how to emphasize things before feedback.

In the context of the real story, posting the reviews of various reviewers is simply distracting. Posting the review of the SPC is more reasonable, but I prefer not to. We have a review process with certain expectations, one of which is that reviews aren’t revealed to the world. Given that, I’m uninterested in angering people by violating that expectation any more than the minimum necessary to address a basic drawback of the current process: no mechanism exists for pointing out that an SPC was simply wrong on basic facts. Note that a strong check on the honesty of the summary exists: the SPC member can easily enough dispute the summary (anonymously if so desired).

I want to acknowledge several points which I think are quite valid.
1) We agree posting in this way makes double blind reviewing of it impossible. That was a conscious cost of the decision to post.
2) Posting this is unfair in some sense because it is not a mechanism which is available to most people. We agree, and I generally have (and plan to) avoid posting on rejections.

One serious general reason for posting about particularly badly done rejections is that we want to improve the process. It is entirely possible (for example) the SPC did not realize that a false statement was made. We need to ask ourselves, “what is best for the process of science?”

We are all familiar with stories of good papers being rejected. How does that happen? Can it be prevented? The mechanisms we (as a community) have implemented recently (author feedback and double blind reviewing) seem to help a bit, but there should be no impression they are the cure-all. Discussing how errors happen might help us prevent new errors, and examples seem necessary to the process of any such discussion. We need to build wisdom.

It is (shortly) my turn in the position of an SPC since I am on the PC at NIPS. I will strive to avoid making an error like this one, but I may fail as infallibility is too much to expect of anyone. If I fail then I should be made aware of it.
Anonymous says:

7/5/2006 at 1:47 pm

Figure 2, the label and diagram do not show the same sentence (the word “big” is omitted in the diagram).
Pingback: Machine Learning (Theory) » What to do with an unreasonable conditional accept

Comments are closed.