Machine Learning (Theory)


Compassionate Reviewing

Most long conversations between academics seem to converge on the topic of reviewing where almost no one is happy. A basic question is: Should most people be happy?

The case against is straightforward. Anyone who watches the flow of papers realizes that most papers amount to little in the longer term. By it’s nature research is brutal, where the second-best method is worthless, and the second person to discover things typically gets no credit. If you think about this for a moment, it’s very different from most other human endeavors. The second best migrant laborer, construction worker, manager, conductor, quarterback, etc… all can manage quite well. If a reviewer has even a vaguely predictive sense of what’s important in the longer term, then most people submitting papers will be unhappy.

But this argument unravels, in my experience. Perhaps half of reviews are thoughtless or simply wrong with a small part being simply malicious. And yet, I’m sure that most reviewers genuinely believe they can predict what will and will not be useful in the longer term. This disparity is a lack of communication. When academics have conversations about reviewing, the presumption of participants in each conversation is that they all share about the same beliefs about what will be useful, and what will take off. Such conversations rarely go into specifics, because the specifics are boring in particular, technical, and because their is a real chance of disagreement on the specifics themselves.

When double blind reviewing was first being considered for ICML, I remember speaking about the experience in the Crypto community, where in my estimate the reviewing was both fairer and less happy. Many conferences in machine learning have shifted to doubleblind reviewing, and I think we have seen this come to pass here as well. Without double blind reviewing, it is common to have an “in” crowd who everyone respects and whose papers are virtually always accepted. These people are happy, and the rest have little voice. With double blind reviewing, everyone suffers substantial rejections.

We might say “fine, at least it’s fair”, but in my experience there is a real problem. From a viewpoint external to the community, when the reviewing is poor and the viewpoint of people in the community highly contradictory, nothing good happens. Outsiders (i.e. most people) viewing the acrimony choose some other way to solve problems, proposals don’t get funded, and the community itself tends to fracture. For example, in cryptography, TCC (not double blind) has started, presumably because the top theory people got tired of having their papers rejected at Crypto (double blind). From a process-of-research standpoint, this seems suboptimal, as different groups using different methods to solve similar problems are particularly the people who you would prefer talking to each other.

What seems to be lost with double blind reviewing is some amount of compassion, unfairly allocated. In a double blind system, any given paper is plausibly from someone you don’t know, and since most papers go nowhere, plausibly not going anywhere. Consequently, the bias starts “against” for all work, a disadvantage which can be quite difficult to overcome. Some time ago, I discussed how I thought motivation should be the responsibility of the reviewer. Aaron Hertzman strongly disagreed on the grounds that this belief could dead end your career as an author. I’ve come to appreciate his viewpoint to an extent. But, it misses the point slightly—the question of “What is good for the community?” differs from “What is good for the author?” In a healthy community, reviewers will actively understand why a piece of work is or is not important, filling in and extending the motivation as they consider the problem.

So, a question is: How can we get compassionate reviewing? (And in a fair way?) It might help somewhat for reviewers to actively consider, as part of their review, the level and mechanism of impact that a paper may have. Reducing reviewing load is certainly helpful, but it is not sufficient alone, because many people naturally interpret a reduced reviewing load as time to work on other things. And, some mechanisms seem to even harm. For example, the two-phase reviewing process that ICML currently uses might save 0.5 reviews/paper, while guaranteeing that for half of the papers, the deciding review is done hastily with no author feedback, a recipe for mistakes.

What creates a great deal of compassion? Public responsibility helps (witness workshops more interesting than conferences). A natural conversation helps (the current method of single round response tends to be very stilted). And time, of course, helps. What else?

9 Comments to “Compassionate Reviewing”
  1. You don’t mention what in my opinion is by far the best solution: simply accept a larger percentage of papers. If a paper is technically correct, at least somewhat original and is of interest to someone in the conference, it should be accepted. History and the considered consensus of researchers, not three reviewers, will decide which papers will be considered seminal contributions to the field. This policy is common to many conferences in neighbouring fields, such as computational intelligence, and those conferences typically have acceptance rates of around 50%.

    After all, as an established researcher, why should I bother with sending good papers to conferences where I’m not virtually sure that they will get accepted?

    • jl says:

      I’m for “arxiv it all”.

      At a practical level, accepting most (or even half) of the papers for presentation at the conference itself can be problematic in a couple ways. One way is the extra cost of the physical facilities, and another is that conference attendees with twice as much stuff to wade through to find what they want might be unhappy. This last may not be as costly as reviewing, but it should be understood that some mechanism of substantially organizing material is needed, where one level of organization is an emphasis of importance.

  2. Peter Boothe says:

    In double-blind, the system is fair, but everyone is unhappy. I propose that we do nothing-blind. That is, all papers and all reviews are published with author and reviewer names attached. This was successfully attempted, and results were reported at:

    The basic upshot was that the reviewers claimed not to have altered the harshness of their reviews, and the authors claimed that the reviews were politer and nicer. Basically, it increased politeness all around, which seems a necessary prerequisite for happiness.

    • Miguel Vazquez says:

      I like this nothing-blind idea, I really think it could work. It will make the reviewers work more their reviews as they may also stand scrutiny.

      • The nothing-blind approach doesn’t always work well. Nature tried it for a while as an experiment (although not with all their papers), and they wrote up a nice report of their experience here: . In discussing it with some other folks, one concern I heard was about the potential for retribution by displeased authors, especially if the referees are in junior positions. That is, suppose someone senior submits a paper of actual poor quality (maybe they know this beforehand, or maybe they don’t), and it’s assigned to two junior people to referee, openly. Do you think these junior people will be honest in stating publicly that this paper isn’t very good? I worry that they would not, and that social factors like seniority, stature, potential future retribution / quid-pro-quo, etc. would corrupt a truly open review process. I’m not saying the current pathological behavior in peer review (single or double blind) is any better, but for sure, I don’t think a nothing-blind process is going to solve our problems.

  3. A says:

    You seem to have absolutely hit the nail on the head about reviewers starting off with a very negative attitude to most papers. One of the problems of NIPS and ICML reviewing that I have felt both as an author and a reviewer is that 90% of the time, the matching of papers to reviewers is very very poor. This is possibly a consequence of the community being very diverse. In any case, we need to figure out a better way to match papers to reviewers.

  4. Like most real problems, it is multi-dimensional. Reflecting on my latest chunk of reviewing (for COLT) it occurred to me that researchers have differing views of what the point of publishing a paper is since people are complex. Here’s some stereotypes.

    Some institutions make it a necessary condition of attending a conference that you have a paper there. (They really do!). Perhaps they think one only goes to a conference to transmit, not receive! Authors from such places are writing a ticket to attend…

    Some researchers are still enamoured of their own cleverness, and view the purpose of writing a paper as a way to reinforce to the world how brilliant they are.

    Some researchers are fanatical about priority. This is not a new phenomenon (a famous early example is Robert Hooke For them a paper is a stake in the ground that the got there first. (The fact that they probably didn’t and the world actually does not care so much who gets there first is irrelevant.)

    Some researchers think that a paper is all about communicating. For them they will put a lot of effort into connecting to the big picture. They might thus have a smaller amount of novel material per page than a researcher less concerned about that.

    Some care about the practicality of what they propose. Others are far more driven by their own perception of theoretical ‘depth’.

    Now consider what happens when each of these types gets to review each others papers…. you get the situation John describes. Each type looks down their nose at the other. Its no different to different disciplines looking down their noses at others. I don’t think its a matter of going into the review process being negative, more that there are conflicting ideals of what makes a great paper.

    Regarding what can be done? Perhaps all ML researchers need to be taught humility by their professors. (Yes, its a joke, but with a serious side – anyone who ends up with managerial responsibilities recognises how diverse people really are). Or perhaps a public debate on what makes a great paper. (John did you do that before?). My view is that anyone who claims there is one essence of a great paper is wrong … there are many different types of great paper…


Leave a Reply

Powered by WordPress