Few would mistake the process of academic paper review for a fair process, but sometimes the unfairness seems particularly striking. This is most easily seen by comparison:
Paper | Banditron | Offset Tree | Notes |
Problem Scope | Multiclass problems where only the loss of one choice can be probed. | Strictly greater: Cost sensitive multiclass problems where only the loss of one choice can be probed. | Often generalizations don’t matter. That’s not the case here, since every plausible application I’ve thought of involves loss functions substantially different from 0/1. |
What’s new | Analysis and Experiments | Algorithm, Analysis, and Experiments | As far as I know, the essence of the more general problem was first stated and analyzed with the EXP4 algorithm (page 16) (1998). It’s also the time horizon 1 simplification of the Reinforcement Learning setting for the random trajectory method (page 15) (2002). The Banditron algorithm itself is functionally identical to One-Step RL with Traces (page 122) (2003) in Bianca‘s thesis with the epsilon greedy strategy and a multiclass perceptron with update scaled by the importance weight. |
Computational Time | O(k) per example where k is the number of choices | O(log k) per example | Lower bounds on the sample complexity of learning in this setting are a factor of k worse than for supervised learning, implying that many more examples may be needed in practice. Consequently, learning algorithm speed is more important than in standard supervised learning. |
Analysis | Incomparable. An online regret analysis showing that if a small hinge loss predictor exists, a bounded number of mistakes occur. Also, an algorithm independent analysis of the fully realizable case. | Incomparable. A learning reduction analysis showing how the regret of any base classifier bounds policy regret. Also contains a lower bound and comparable analysis of all plausible alternative reductions. | |
Experiments | 1 dataset, comparing with no other approaches to solving the problem. | 13 datasets, comparing with 2 other approaches to solve the problem. | |
Outcome | Accepted at ICML | Rejected at ICML, NIPS, UAI, and NIPS. |
The reviewers of the Banditron paper made the right call. The subject is interesting, and analysis of a new learning domain is of substantial interest. Real advances in machine learning often come as new domains of application. The talk was well attended and generated substantial interest. It’s also important to remember the reviewers of the two papers probably did not overlap, so there was no explicit preference for A over B.
Why was the Offset Tree rejected? One of these rejections is easily explained as a fluke—we ran into a reviewer at UAI who believes that learning by memorization is the way to go. I, and virtually all machine learning people, disagree but some reviewers at UAI aren’t interested or expert in machine learning.
The striking thing about the other 3 rejects is that they all contain a reviewer who doesn’t read the paper. Instead, the reviewer asserts that learning reductions are bogus because for an alternative notion of learning reduction, made up by the reviewer, an obviously useless approach yields a factor of 2 regret bound. I believe this is the same reviewer each time, because the alternative theorem statement drifted over the reviews fixing bugs we pointed out in the author response.
The first time we encountered this review, we assumed the reviewer was just cranky that day—maybe we weren’t quite clear enough in explaining everything as it’s always difficult to get every detail clear in new subject matter. I have sometimes had a very strong negative impression of a paper which later turned out to be unjustified upon further consideration. Sometimes when a reviewer is cranky, they change their mind after the authors respond, or perhaps later, or perhaps never but you get a new set of reviewers the next time.
The second time the review came up, we knew there was a problem. If we are generous to the reviewer, and taking into account the fact that learning reduction analysis is a relatively new form of analysis, the fear that because an alternative notion of reduction is vacuous our notion of reduction might also be vacuous isn’t too outlandish. Fortunately, there is a way to completely address that—we added an algorithm independent lower bound to the draft (which was the only significant change in content over the submissions). This lower bound conclusively proves that our notion of learning reduction is not vacuous as is the reviewer’s notion of learning reduction.
The review came up a third time. Despite pointing out the lower bound quite explicitly, the reviewer simply ignored it. This more-or-less confirms our worst fears. Some reviewer is bidding for the paper with the intent to torpedo review it. They are uninterested in and unwiling to read the content itself.
Shouldn’t author feedback address this? Not if the reviewer ignores it.
Shouldn’t Double Blind reviewing help? Not if the paper only has one plausible source. The general problem area and method of analysis were freely discussed on hunch.net. We withheld public discussion of the algorithm itself for much of the time (except for a talk at CMU) out of respect for the review process.
Why doesn’t the area chair/program chair catch it? It took us 3 interactions to get it, so it seems unrealistic to expect someone else to get it in one interaction. In general, these people are strongly overloaded and the reviewer wasn’t kind enough to boil down the essence of the stated objection as I’ve done above. Instead, they phrase it as an example and do not clearly state the theorem they have in mind or distinguish the fact that the quantification of that theorem differs from the quantification of our theorems. More generally, my observation is that area chairs rarely override negative reviews because:
- It risks their reputation since defending a criticized work requires the kind of confidence that can only be inspired by a thorough personal review they don’t have time for.
- They may offend the reviewer they invited to review and personally know.
- They figure that the average review is similar to the average perception/popularity by the community anyways.
- Even if they don’t agree with the reviewer, it’s hard to fully discount the review in their consideration.
I’ve seen these effects create substantial mental gymnastics elsewhere.
Maybe you just ran into a cranky reviewer 3 times randomly Maybe so. However, the odds seem low enough and the 1/2 year cost of getting another sample high enough, that going with the working hypothesis seems indicated.
Maybe the writing needs improving. Often that’s a reasonable answer for a rejection, but in this case I believe not. We’ve run the paper by several people, who did not have substantial difficulties understanding it. They even understand the draft well enough to make a suggestion or two. More generally, no paper is harder to read than the one you picked because you want to reject it.
What happens next? With respect to the Offset Tree, I’m hopeful that we eventually find reviewers who appreciate an exponentially faster algorithm, good empirical results, or the very tight and elegant analysis, or even all three. For the record, I consider the Offset Tree a great paper. It remains a substantial advance on the state of the art, even 2 years later, and as far as I know the Offset Tree (or the Realizable Offset Tree) consistently beat all reasonable contenders both in prediction and computational performance. This is rare and precious, as many papers tradeoff one for the other. It yields a practical algorithm applicable to real problems. It substantially addresses the RL to classification reduction problem. It also has the first nonconstant algorithm independent lower bound for learning reductions.
With respect to the reviewer, I expect remarkably little. The system is designed to protect reviewers, so they have virtually no responsibility for their decisions. This reviewer has a demonstrated capability to sabotage the review process at ICML and NIPS and a demonstrated willingness to continue doing so indefinitely. The process of bidding for papers and making up reasons to reject them seems tedious, but there is no fundamental reason why they can’t continue doing so for several decades if they remain active in academia.
This experience has substantially altered my understanding and appreciation of the review process at conferences. The bidding mechanism commonly used, coupled with responsibility-free reviewing is an invitation to abuse. A clever abusive reviewer can sabotage perhaps 5 papers per conference (out of 8 reviewed), while maintaining a typical average score. While I don’t believe most people choose papers with intent to sabotage, the capability is there and used by at least one person and possibly others. If, for example, 5% of reviewers are willing to abuse the process this way and there are 100 reviewers, every paper must survive 5 vetoes. If there are 200 reviewers, every paper must survive 10 vetoes. And if there are 400 reviewers, every paper must survive 20 vetoes. This makes publishing any paper that offends someone difficult. The surviving papers are typically inoffensive or part of a fad strong enough that vetoes are held back. Neither category is representative of high quality decision making. These observations suggest that the conference with the most reviewers tend strongly toward faddy and inoffensive papers, both of which often lack impact in the long term. Perhaps this partly explains why NIPS is so weak when people start citation counting. Conversely, this would suggest that smaller conferences and workshops have a natural advantage. Similarly, the reviewing style in theory conferences seems better—the set of bidders for any paper is substantially smaller, implying papers must survive fewer vetos.
This decision making process can be modeled as a group of n decision makers, each of which has the opportunity to veto any action. When n is relatively small, this decision making process might work ok, depending on the decision makers, but as n grows larger, it’s difficult to imagine a worse decision making process. The closest representatives outside of academia I know are deeply bureacratic governments and other large organizations where many people must sign off on something before it takes place. These vetocracies are universally frustrating to interact with. A reasonable conjecture is that any decision making process with a large veto number has poor characteristics.
A basic question is: Is a vetocracy inevitable for large organizations? I believe the answer is no. The basic observation is that the value of n can be logarithmic in the number of participants in an organization rather than linear, as per reviewing under a bidding process. An essential force driving vetocracy creation is a desire to offload responsibility for decisions, so there is no clear decision maker. A large organization not deciding by vetocracy must have a very different structure, with clearly dilineated responsibility.
NIPS provides an almost perfect natural experiment in it’s workshop organization, which involves the very same community of people and subject matter, yet works in a very different manner. There are one or two workshop chairs who are responsible for selecting amongst workshop proposals, after which the content of the workshop is entirely up to the workshop organizers. If a workshop is rejected, it’s clear who is at fault, and if a workshop presentation is rejected, it is often clear by who. Some workshop chairs use a small set of reviewers, but even then the effective veto number remains small. Similarly, if a workshop ends up a flop, it’s relatively easy to see who to blame—either the workshop chair for not predicting it, or the organizers for failing to organize. I can’t think of a single time when I attended both the workshops and the conference that the workshops were less interesting than the conference. My understanding is that this observation is common. Given this discussion, it will be particularly interesting to see how the review process Michael and Leon setup for ICML this year pans out, as it is a system with notably more responsibility assignment than in previous years.
Journals end up looking relatively good with respect to vetocracy avoidance. The ones I’m familiar with have a chief editor who bears responsibility for routing papers to an action editor, who bears responsibility for choosing good reviewers. Every agent except the reviewers is often known by the authors, and the reviewers don’t act as additional vetoers in nearly as strong a manner as reviewers with the opportunity to bid.
This experience has also altered my view of blogging and research. On one hand, I’m very enthusiastic about research in general, and my research in particular, where we are regularly cracking conventionally impossible problems. On the other hand, it seems that some small number of people viewing a discussion silently decide they don’t like it, and veto it given the opportunity. It only takes one to turn strong paper into a years-long odyssey, so public discussion of research directions and topics in a vetocracy is akin to voluntarily wearing a “kick me” sign. While this a problem for me, I expect it to be even worse for the members of a vetocracy in the long term.
It’s hard to imagine any research community surviving without a serious online presence. When a prospective new researcher looks around at existing research, if they don’t find serious online discussion, they’ll assume it doesn’t exist under the “not on the internet so it doesn’t exist” principle. This will starve a field of new people. More generally, there is an opportunity to get feedback about research directions and problems much more rapidly than is otherwise possible, allowing us to avoid research on dead end topics which are pervasive. At some point, it may even seem that people not willing to discuss their research simply avoid doing so because it is critically lacking in one way or another. Since a vetocracy creates a substantial disincentive to discuss research directions online, we can expect that communities sticking with decision by vetocracy to be at a substantial disadvantage.
John, I am the reviewer who have rejected your repetitive attempts to submit this worthless paper. I won’t go into details, since the review has all of them. I will, however, address your allegations w.r.t. to my evil intentions. I do recognize your paper and bid on it, but only as a service to other reviewers. If I have read your paper, why not save some time to unsuspecting reviewers? Yours, anonymous reviewer.
I think I believe you (with perhaps a 20% chance you are simply a troll), but I’m surprised you responded.
I don’t envy you. All any reader needs to do is read the short simple lower bound theorem & proof (page 9) to see that your primary argument for rejection is vacuous. You’ve also confirmed for any reader with remaining doubts that there are active torpedo reviewers in machine learning, which I’m not proud of. The alacrity of your response suggests you are close reader of hunch.net, which you may indeed be using as a torpedo hit list. Your rationalization for providing a service to other reviewers seems absurd to me, as you could have greatly improved the efficiency of the process by simply engaging in a discussion on the many posts about learning reductions here. Alternatively, simply adding at the end of your review “I will seek out and torpedo this paper henceforth even if my argument is rebutted with a proof.” would have allowed us to diagnose the nature of the problem after one round rather than four.
I hope that comment isn’t real. Independent of any argument of the merits of your paper, I really feel sorry for someone who would take a paper that someone has obviously worked hard on and cares deeply about, and dismiss it as “worthless”. (Usually this is the refuge of someone not able to muster specific technical criticisms.) If this kind of attitude is present in the reviews themselves, the chair should send the review back and demand basic civility. This kind of ugliness should not be tolerated.
I posted here a response that this box is too small to contain.
I would assume that, to the extent that one believes Anonymous (Coward) above to be the reviewer in question, they’re not exactly anonymous to the editors of the peer reviewed journal(s) in question, and this exchange may or may not be of interest to said reviewers?
Dear anonymous reviewer (I just suppose now, that you are the one you claim to be),
Your bidding on this paper after you recognized it is in contradiction to the rules of the review process. As soon as you recognize a paper, you are not objective anymore. Even if you recognize the source of the paper only while reading it, you should he honest enough the decline reviewing it. That is a basic rule of the review process, which you obviously to not follow.
If you are sure enough that the paper is worthless, then also other reviewers will come to the same decision. You do not have the duty nor the right to “save” the scientific world from having to see the results of this paper.
And I want to stress one more thing: Even thought the review process happens in a double-blind manner, it is worth to note, that the chair and the organizers of these conferences of course have access to the author-reviewer relations. And in this case I would explicitly ask for exclusion of this reviewer from further reviewing, at least for papers from this group of authors.
The explanation in your post for why this happens seems to require more than “the ML community is growing”. Instead, it seems to require “the rate of growth of the ML community is growing”. While this might be true, it’s correctness isn’t clear to me. For example, the discrete derivative on paper submissions doen’t appear to be steepening. Do you have evidence of (b)?
Your analysis on surviving vetoes as a function of the number of reviewers seems to assume that the number of papers remains constant. If the number of submissions is linear in the number of reviewers, this doesn’t seem as extreme a problem, because the expected number of vetoes you need to survive remains fixed.
It doesn’t really require that, it just requires that there was a period of superlinear growth (inflation) in the past such that “esperienced” reviewers are mostly those who came into the field before the end of that inflationary period. I believe that’s the case both from personal observation and from the theoretical analysis in works like Tullock’s “The Organization of Inquiry.” This is a pretty standard demographic story: superlinear growth while resources are effectively unlimited relative to the size of the community, followed by flattening growth as resources (such as PhD production) run out.
I hope the first comment is a joke. It sounds unethical to continue reviewing the same paper.
This story is really sad.
At least the Banditron paper was a nice one. I have seen even worse — inacceptably weak papers submited and which get in because the authors send update on their progress to the PC chair. And I am sure storng papers getting rejected…
Wouldn’t we expect the reviewing problems during the superlinear growth time, rather than afterwards? (Where you think we are now, right?)
This may be too technical a discussion for comments. Perhaps you can make a post on your blog laying out the argument in more detail—I don’t quite see it yet. (Or, we can talk in person sometime.)
You may be correct. It seems to depend on torpedo reviewer psychology, which I frankly don’t understand (and it took me a long time to admit the existence of). The question is: Do torpedo reviewers want to torpedo more than 5-or-so papers per conference? If the answer is “yes”, your observation is correct. I’ve never submitted more than 5 papers to a conference, so someone particularly oriented towards rejecting my papers doesn’t run out of vetoes.
Hi John — So sorry to hear about this — it is such a mean thing for somebody to do! Good luck with your paper, and I hope it gets published soon!
–k
To the putatively real torpedo review, you say
“If I have read your paper, why not save some time…”
The simplest answers here are that (a) you probably haven’t read the revised paper and (b) others are also at least as familiar with the topic as you are. In the first case, you should be spending the time to re-read the paper and in the second case you aren’t saving anything.
“if you recognize the source of the paper only while reading it, you should be honest enough to decline reviewing it.”
I think that may not always be possible. Independent of what happened here to John’s paper, if a subfield is very small/new it may be unavoidable to not recognize who the author is. Moreover, it might be impossible to find a reviewer who is both reasonably qualified to review the paper and does not know the author of the paper.
Practically speaking, I think all that we need for there to be a problem is (a) for reviewing well to be hard, and (b) for a non-negligible proportion of reviewers to suck.
I’m sorry to hear this story, and best of luck in getting this work accepted.
I have to admit being skeptical about the problem of torpedos. If there were that many, wouldn’t some of us have seen them as co-reviewers? This doesn’t mean that they don’t exist, simply that there are many things about reviewing that we ought to fix first. Of the poor and/or unprofessional reviews that you’ve seen as an author, reviewer, and PC member, how many do you think are torpedos?
Furthermore, even if this paper has been torpedoed, I’d be shocked if the first comment were real. Consider the base rate fallacy: we can all agree that there are far more trolls on the Internet than torpedos. And he could have easily proven his identity by including a detail from his review.
Fundamentally (I can’t remember who said this first), the combination of low acceptance rates and poor-quality reviews is toxic. Whenever humans encounter a phenomenon that’s highly unpredictable, they want a deterministic explanation. The result is superstition. An example that comes to mind is dating, and the various ridiculous theories about “what men want” and “what women want”. (That this example comes so readily to mind has no connection whatsoever to the current state of my personal life.)
Nevertheless, again, best of luck with the paper.
With respect to your base rate hypothesis, note that the base rate of trolling has been approximately zero so far on hunch.net.
Convincingly proving it’s a torpedo review is remarkably difficult, because the burden of evidence is on the author to prove something hidden about the reviewer. In this instance, it was possible because the reviewer used very similar arguments over multiple submissions and the argument was entirely refutable with a proof. Many people would give up on a paper after fewer rejections, not have access to a mechanism for even making a thorough case, have a torpedo reviewer who is more imaginative, or have a torpedo reviewer who uses an argument that is not refutable. Considering the substantial difficulty of making a case, it’s plausible that we systematically underestimate how common this is.
Coming at it from the other direction, one necessary but insufficient signature for a torpedo review is a negative argument which is fundamentally senseless or unfair, yet couched in a way which makes this unclear. Both as a fellow reviewer and an author, I’d have to say this happens fairly often—5% might be a reasonable estimate from my experience as a fellow reviewer, and substantially higher in my experience as an author.
> With respect to your base rate hypothesis, note that the base rate of trolling has been approximately zero so far on hunch.net.
Of course I am a troll! Yes, I subscribe to your blog because I am a researcher in a close area, but this post was virtually impossible to resist. That you even considered otherwise just shows how pathetic your whining is. And yes, you may be an excellent researcher for all I know, but you are a pathetic whiner at the same time.
Instead of considering the most obvious explanation – that your paper is being assigned to the same reviewer due to matching keywords, or the reviewer is bidding on your paper for the same reason, and then he recognizes your paper, skims over it, and copy-pastes the same review – you invent this silly notion of torpedo reviewing, just to pander your inflated ego.
Newsflash – torpedoing happens at much higher level, of conference subjects, policies, etc. No one would waste time on torpedoing your paper, since you can just publish it somewhere else! No one would read your blog for that purpose either – wake up from the bubble you invented for yourself.
And for chrissake, stop whining! Talk to conference chairs, organize your own conference, whatever. Do something productive if you think it’s necessary. The community will thank you.
I encourage no further response to AR. There status as Troll or Torpedo is unclear, but it seems clear nothing further can be learned in that direction.
i think you’re letting area chairs off too easily. yes, there is a general sense that if an area chair gets a paper with one 1 and two 5s, it’s a reject. but this needn’t be the case. it is effectively the area chairs who decide that a vetocracy is a reasonable model and who execute their own decisions based on this model. having been an area chair several times, i find that among papers with an average score around 3.5, the high variance onces are often much more interesting than the low variance ones and i like to try to take this into account (yes, this means i read a lot of the papers myself). sure, i could piss off the reviewer who gave the paper a one, but to be honest, how many of us really remember what papers we reviewed and what scores we gave them and how other reviewers scored them and six months later whether they even get in at all. besides, i think it’s the area chair’s responsibility to be willing to piss off reviewers that he/she thinks are wrong; if you don’t want that responsibility, don’t agree to be an area chair.
just one minor comment: on a {1,5,5} paper, why is it “okay” to piss off the two 5 reviewers by rejecting but not “okay” to piss off the one 1 reviewer by accepting?
To be fair, the situation was not so stark as 1/5/5. Writing a paper about how to apply ML to a new class of problems is not that easy—reviewers typically haven’t thought about problems in that class before, so careful reading and substantial thought are needed more than for many other papers.
John, I’m very sorry to hear this story.
I think that overloading of reviewers and area chairs is a major cause.
In particular, I reviewed one of the versions of the paper and gave it a score of 9 out of 10 with a high confidence. As far as I remember, there was no further discussion between reviewers (at least, I checked my gmail account, and didn’t find a request to refer to a discussion).
I guess it was my responsibility to check out the other reviews and to initialize a discussion. I was overloaded with too many reviews.
It was probably also the responsibility of the AC to ask the reviewers to discuss the high variance. Again, my guess is that the AC was overloaded.
This year, there is an attempt in ICML to reduce the load on reviewers by having two phases of review. I think it’s a good idea.
Another idea is to pay for a more professional review process. We pay students for teaching assistant and for assessing exercises/exams. Why won’t we also pay for reviewing ?
It will also reduce the number of unworthy submissions (because, I expect people to pay for each submission).
I’m curious to hear what do other people think?
That comment was a Troll – the naive bayesian classifier encoded in my hardware says so.
I don’t think it’s a joke. Torpedo reviewing is real, and something needs to be done about it. I think that there is something fundamentally wrong with the whole idea of peer review. Does it really work, or is it a myth perpetuated from one generation of scientists to the next?
I agree that paying it’s a good idea.
As far, as I can tell, the review process is highly unfair and/or random in many of the conferences I attend.
Just saw this post and related comments and I’d like to renew my frequent call for an online research journal for machine learning. At this point hunch.net is my definitive source for discovering new or interesting research in our field; it combines strong editorial insight from John (and the hunch community) with the flexibility of the blog format to quickly expose interesting lines of work.
Please take a bunch of pages from the open source community and build an open source style research framework — surely, if people can design, develop, and extend, complex software, via distributed communities, then machine learning researchers can develop an additional channel for exploring new research. And, yes, I am willing to help 🙂
Michael Nielsen’s post on his blog is very relevant to this thread: http://michaelnielsen.org/blog/?p=531
By the way, ICML is trying a new review process this year, and it no longer has reviewers bid on papers. Instead, authors select an area chair, and the area chair assigns the paper to one or more of his/her reviewers (in the first round of reviews). This has placed a large burden on the area chairs (and there should be an interesting post-mortem discussion of the pros and cons of the new system), but one up side might be that it would severely limit the possibility of torpedo reviews.