One way that many conferences in machine learning assign reviewers to papers is via bidding, which has steps something like:
- Invite people to review
- Accept papers
- Reviewers look at title and abstract and state the papers they are interested in reviewing.
- Some massaging happens, but reviewers often get approximately the papers they bid for.
At the ICML business meeting, Andrew McCallum suggested getting rid of bidding for papers. A couple reasons were given:
- Privacy The title and abstract of the entire set of papers is visible to every participating reviewer. Some authors might be uncomfortable about this for submitted papers. I’m not sympathetic to this reason: the point of submitting a paper to review is to publish it, so the value (if any) of not publishing a part of it a little bit earlier seems limited.
- Cliques A bidding system is gameable. If you have 3 buddies and you inform each other of your submissions, you can each bid for your friend’s papers and express a disinterest in others. There are reasonable odds that at least two of your friends (out of 3 reviewers) will get your papers, and with 2 adamantly positive reviews, your paper has good odds of acceptance.
The clique issue is real, but it doesn’t seem like a showstopper to me. If a group of friends succeeds at this game for awhile, but their work is not fundamentally that interesting, then there will be no long term success. The net effect is an unfocused displacement of other perhaps-better papers and ideas.
It’s important to recall that there are good aspects of a bidding system. If reviewers are nonstrategic (like I am), they simply pick the papers that seem the most interesting. Having reviewers review the papers that most interest them isn’t terrible—it means they pay close attention and generally write better reviews than a disinterested reviewer might. In many situations, simply finding reviewers who are willing to do an attentive thorough review is challenging.
However, since ICML I’ve come to believe there is a more serious flaw than any of the above: torpedo reviewing. If a research direction is controversial in the sense that just 2-or-3 out of hundreds of reviewers object to it, those 2 or 3 people can bid for the paper, give it terrible reviews, and prevent publication. Repeated indefinitely, this gives the power to kill off new lines of research to the 2 or 3 most close-minded members of a community, potentially substantially retarding progress for the community as a whole.
A basic question is: “Does torpedo reviewing actually happen?” The evidence I have is only anecdotal, but perhaps the answer is “yes”. As an author, I’ve seen several reviews poor enough that a torpedo reviewer is a plausible explanation. In talking to other people, I know that some folks do a lesser form: they intentionally bid for papers that they want to reject on the theory that rejections are less work than possible acceptances. Even without more substantial evidence (it is hard to gather, after all), it’s clear that the potential for torpedo reviewing is real in a bidding system, and if done well by the reviewers, perhaps even undectectable.
The fundamental issue is: “How do you chose who reviews a paper?” We’ve discussed bidding above, but other approaches have their own advantages and drawbacks. The simplest approach I have right now is “choose diversely”: perhaps a reviewer from bidding, a reviewer from assignment by a PC/SPC/area chair, and another reviewer from assignment by a different PC/SPC/area chair.
I tend to bid for papers that (a) I want to review, and (b) I feel like I ought to review. The latter category includes papers that I believe that I have more expertise about than most PC members for that particular conference, even if I don’t really want to review them.
The vision conferences have an interesting way of doing review assignments. The Area Chairs (= committee members) pick 5 candidate reviewers from the reviewing pool, and then a bipartite matching algorithm assigns 3 reviewers for all papers based on these picks. An advantage of this system is that the reviewing is truly blind: neither the Area Chairs (who are doing the assignments or making the decisions) nor the reviewers know the identities of the authors. (i.e., if an AC requests a conflicted reviewer for a paper, the matching algorithm will ignore that request). (The assignment of paper to AC is done by the program chairs; the program chairs also tweak the results of the matching algorithm). This avoids both the issues you raise, although you could say that the AC is a single point-of-failure (i.e., if they assign bad reviewers to a paper or reviewers who have the same biases that they do).
I have seen evidence of torpedo reviewing; my last paper got 5/5, 4/5 and 1/5 at a 1st-tier computer vision conference, which was enough to get it knocked out. The 1/5 reviewer gave no valid critiques of the paper and the sparse comments indicated that the reviewer only read the abstract (e.g. “you forgot to reference X and Y”, which were in fact referenced, etc.)
The longest part of the review was a rant about the poor quality of the work coming from Poggio‘s group at MIT, which I’m not associated with in any way, other than building on their ideas. My advisor figured the reviewer had been rejected from MIT at one point…
How frustrating torpedo reviewers are!
You seem to be able to make some pretty high-confidence inferences from very noisy and sparse measurements. From one brief review you can infer the Area Chair’s opinion and the reviewer’s academic history?
If the Area Chair was doing their job, they would have discounted that review. However, they should have also written a consolidation report that explained their decision. Consolidation reports are really important authors to understand the decisions (since so many authors seem to assume the very worst when their papers are rejected).
Also, I don’t think that’s what John means by torpedo reviewing, i.e., deliberately bidding on papers in order to kill all papers in that research area.
I think you’re missing a big “#3” up there. (I don’t want to speak for Andrew, and this is according to my memory, so take with a grain of salt.) That is: people bid for papers they *want* to review, not (necessarily) papers they are *qualified* to review. Aaron hinted at this as well. In fact, the proposal that Andrew made (again, IIRC) was to automatically assign papers to reviewers. I think comparing reviewer bibtex files with paper references (ala WhatToSee) would be a good way to do this, though I suspect Andrew was thinking more along the lines of some topic modeling stuff. (Incidentally, some people in the NLP community played around with this idea a few years back—I don’t remember who: perhaps Bob will chime in—and it seemed quite promishing.) At any rate, at least anecdotally, it is *this* that seems like the biggest problem with bidding.
(NLP conferences get around the partial confidentiality problem by having reviewers recruited into areas by area chairs, and then they only see listings of papers submitted to that area. I think it’s crazy that ICML doesn’t do this yet for a variety of reasons, but that’s just me — plus a bunch of other people I’ve talked to.)
I wonder if reviewers tend to write more careful reviews for papers they reviewed for. It gives reviewers a sense of empowerment, and makes them think “Gee, I requested this paper, I better do a good job on it.”
This is probably a very minor factor relative to all the others mentioned.
Hal’s point is a good one. Wanting to review a paper doesn’t mean you are qualified. While in a top-tier PC meeting recently (the conference tried bidding for the first time this past year), there were papers which had been assigned three graduate students as reviewers. Now I love graduate students (being one myself), but that just doesn’t make any sense.
Also, I don’t like bidding because I never seem to get assigned my preferred papers. 😉
One way of mitigating these negative effects somewhat would be to only show each reviewer a random subset of (say) 1/4 of the total papers submitted. If there are only 3 people out to “torpedo” a given paper, the chance that all 3 reviewers manage to bid on the same paper is fairly low. Same reasoning works for the clique problem as well.
I was simply sharing my experience about someone who seemed to have bid on my paper simply to reject it because of its topic, which is the gist of what John was discussing in torpedo reviewing.
I did not intend to imply any state of confidence in my inference, I was simply sharing my supervisor’s theory. We were both quite surprised at the highly unprofessional nature of the review. We discussed the reviewer’s clear bias against the research area. I also did not (try to) infer any opinion of the AC, the consolidation report only said to submit it to one of the workshops.
A good review should focus on the paper and its contributions, instead I got an assault on the research area itself and a rank too low to justify. Perhaps I’m wrong, but it seems quite related to torpedo reviewing.
Another interesting point to note about torpedo reviewing: it would be easier to do in a 1st-tier conference, since it really only takes one 1 to knock a paper out. In a lower tier conference, it would certainly have to be more organized.
How about introducing a system where reviewers get scores as follows: for a paper which you give a high score and that gets lots of citations (or some other quality measure) you get good points; for a paper you give a low score that later gets lots of citation you get a lower score; …
This would encourage people to do a good job and should be technically quite easy to setup?
This isn’t so easy to setup: how do you guarantee reviewer anonymity over the years?
Imagine one database which stores which reviewer reviewed which paper (over whatever long time span). At paper review time when reviewer scores are in, the system recomputes reviewer “skill” and together with paper score and confidence these are submitted to the SPC for final decision making.
The database would not be public so anonymity can be preserved. Maybe the reviewer skills could be made public? That would make it even more interesting to do a good job reviewing: your reviewer score really becomes like a bet with a reward.
I don’t recall anyone doing automatic paper assignments that didn’t involve a bidding stage. I believe some people evaluated whether it would work after the fact, but I can’t remember who. It’d be useful to run Hal’s experiment and do auto-assignment, if only to see how it compared to bidding.
There’s a false impression when doing annotation tasks like paper reviewing that there’s a “right” answer as to whether a paper should be accepted, perhaps in the form of a numerical score or linear ranking, and we’re somehow making unbiased estimates of it. In what sense is a torpedo review wrong? Let’s say reviewer A is in love with hierarchical probability models and thinks papers are useless without posterior intervals, whereas reviewer B only has eyes for competition-winning systems. Reviewer C might think systems that don’t scale are useless. These are all potentially valid points of view, and I don’t know what it’d mean to be “open minded” — you need some criteria to do reviewing, and one of those is “interesting”, and different things are interesting to different people.
I believe Jurgen’s on the right track in suggesting some kind of better blending of scores. It’s a classification gold-standard agreement problem. There’s been some nifty work in epidemiology in this area cited in Panos Ipeirotis’s blog entry on gold standards from the Mechanical Turk and my follow-up on Bayesian epidemiological models. Diagnostic tests are like reviewers (often they are doctors looking at things under a microscope or in an x-ray) and their reviews need to be combined into some kind of consensus.
Another idea I’ve heard kicked around in the CL Journal editorial board is a kind of champion-based acceptance, which means one reviewer who really likes the paper will be enough to get it accepted. This would be to try to adjust for the perceived bias against new or different ideas.
Shouldn’t the answer be to make reviewers bid for some proportion of papers more than the number they are actually required to review? i.e. reduce the chance that they get the paper they wish to torpedo (if they’re that way inclined)? (e.g. if the system requires each reviewer to review 5 papers, make them bid for 20 or 25)
p.s. i’m not in this field at all
“Automatic Topics Identi?cation For Reviewer Assignment”
http://www.di.uniba.it/~ndm/publications/files/ferilli06ieaaie.pdf
Uses latent semantic indexing.
The clique issue is real, but it doesn’t seem like a showstopper