In the quest to understand what good reviewing is, perhaps it’s worthwhile to think about what good research is. One way to think about good research is in terms of a producer/consumer model.
In the producer/consumer model of research, for any element of research there are producers (authors and coauthors of papers, for example) and consumers (people who use the papers to make new papers or code solving problems). An produced bit of research is judged as “good” if it is used by many consumers. There are two basic questions which immediately arise:
- Is this a good model of research?
- Are there alternatives?
The producer/consumer model has some difficulties which can be (partially) addressed.
- Disconnect. A group of people doing research on some subject may become disconnected from the rest of the world. Each person uses the research of other people in the group so it appears good research is being done, but the group has no impact on the rest of the world. One way to detect this is by looking at the consumers2 (the consumers of the consumers) and higher order powers. If the set doesn’t expand much with higher order powers, then there is a disconnect.
- Latency. It is extraordinarily difficult to determine in advance whether a piece of research will have many consumers. A particular piece of work may be useful only after a very long period of time. This difficulty is particularly severe for theoretical research.
- Self-fulfillment To some extent, interesting research by this definition is simply research presented to the largest possible audience. The odds that someone will build on the research are simply larger when it is presented to a larger audience. Some portion of this effect is “ok”—certainly attempting to educate other people is a good idea. But in judging the value of a piece of research, discounting by the vigor with which it is presented may be healthy for the system. (In effect, this as a bias against spamming.)
If we accept the difficulties of the producer consumer model, then good reviewing becomes a problem of predicting what research will have a large impact in terms of the numbers of consumers (and consumers^2, etc…) Citations can act (to some extent) as a proxy for consumption implying that it may be possible to (retroactively) score a reviewer’s judgement. There are many difficulties here. For example a citation of the form “[joe blow 93] is wrong and here’s why” isn’t an example of the sort of use we want to encourage. Another important effect is that a reviewer who rejects a paper biases the number of citations a paper later recieves. Another is that a rejected paper that has been resubmitted to another place may change so that it is simply a better paper. It isn’t obvious what a good method is for taking all of these effects into account.
Clearly, there are problems with this model for judging research (and at the second order, judgements of reviews of research). However, I am not aware of any other abstract model for “good research” which is even this good. If you know one, please comment.
If we accept that bad reviewing often occurs and want to fix it, the question is “how”?
Reviewing is done by paper writers just like yourself, so a good proxy for this question is asking “How can I be a better reviewer?” Here are a few things I’ve learned by trial (and error), as a paper writer, and as a reviewer.
- The secret ingredient is careful thought. There is no good substitution for a deep and careful understanding.
- Avoid reviewing papers that you feel competitive about. You almost certainly will be asked to review papers that feel competitive if you work on subjects of common interest. But, the feeling of competition can easily lead to bad judgement.
- If you feel biased for some other reason, then you should avoid reviewing. For example…
- Feeling angry or threatened by a paper is a form of bias. See above.
- Double blind yourself (avoid looking at the name even in a single-blind situation). The significant effect of a name you recognize is making you pay close attention to a paper. Since not paying enough attention is a standard failure mode of reviewers, a name you recognize is inevitably unfair to authors you do not recognize.
- Don’t review fast. For conferences there is a tendency to review papers right at the deadline. This tendency can easily result in misjudgements because you do not have the opportunity to really understand what a paper is saying.
- Don’t review too much. “Too much” is defined on a per-person basis. If you don’t have time to really understand the papers that you review, then you should say “no” to review requests.
- Overconfidence is the enemy of truth. If you are not confident about your review, you should not state that you are. Bad reviews are often overconfident reviews.
- Always try to make review comments nonpersonal and constructive, especially in a rejection.
Given the above, observations, a few suggestions for improved review organization can be derived.
- Double blind. A common argument against double blind reviewing is that it is often defeatable. This is correct and misses the point. The reason why double blind reviewing is helpful is that a typical reviewer who wants to review well is aided by the elimination of side information which should not effect the acceptance of a paper. (ICML and AAAI are double blind this year.) Another reason why double blind reviewing is “right”, is that it simply appears fairer. This makes it easier on average for authors to take rejections in a more constructive manner.
- Staggered deadlines. Many people can’t prioritize reviews well, so the prioritization defaults to deadline proximity. Consequently, instead of having many paper reviews due on one day, having them due at the rate of one-per-day (or an even slower rate) may be helpful. These should be real deadlines in the sense that “you get it in by this date or you are excluded from conversation and decision making about the paper”.
- Large PCs. There is a tendency to value (and romanticize) the great researcher. But a great researcher with many papers to review can only be a mediocre reviewer due to lack of available attention and time. Consequently, increasing the size of the PC may be helpful for small PC conferences.
- Communication channels. A typical issue in reviewing a paper is that some detail is unintentionally (and accidentally) unclear. In this case, being able to communicate with the authors is helpful. This communication can be easily setup to respect the double blind guarantee by routing through the conference site. This communication does not change the meaning of a reviewers job. ICML and AAAI are allowing author feedback. I mean something more spontaneous, but this is a step in that direction.
- Refusal. In many cases, it is not possible to tell that you have a conflict of interest in a paper until after seeing it. A mechanism for saying “I have a conflict of interest, please reassign the paper” should exist, and it’s use should be respected.
- Independence. Access to other reviews should not be available until after completing your own review. The point of having multiple reviews is reducing noise. Allowing early access to other reviews increases noise by decreasing independence amongst reviewers. Many conferences (but not all) follow this pattern.
If you have more ideas, please add them.
This is a difficult subject to talk about for many reasons, but a discussion may be helpful.
Bad reviewing is a problem in academia. The first step in understanding this is admitting to the problem, so here is a short list of examples of bad reviewing.
- Reviewer disbelieves theorem proof (ICML), or disbelieve theorem with a trivially false counterexample. (COLT)
- Reviewer internally swaps quantifiers in a theorem, concludes it has been done before and is trivial. (NIPS)
- Reviewer believes a technique will not work despite experimental validation. (COLT)
- Reviewers fail to notice flaw in theorem statement (CRYPTO).
- Reviewer erroneously claims that it has been done before (NIPS, SODA, JMLR)—(complete with references!)
- Reviewer inverts the message of a paper and concludes it says nothing important. (NIPS*2)
- Reviewer fails to distinguish between a DAG and a tree (SODA).
- Reviewer is enthusiastic about paper but clearly does not understand (ICML).
- Reviewer erroneously believe that the “birthday paradox” is relevant (CCS).
The above is only for cases where there was sufficient reviewer comments to actually understand reviewer failure modes. Many reviewers fail to leave sufficient comments and it’s easy to imagine they commit similar mistakes.
Bad reviewing should be clearly distinguished from rejections—note that some of the above examples are actually accepts.
The standard psychological reaction to any rejected paper is trying to find fault with the reviewers. You, as a paper writer, have invested significant work (weeks? months? years?) in the process of creating a paper, so it is extremely difficult to step back and read the reviews objectively. One distinguishing characteristic of a bad review from a rejection is that it bothers you years later.
If we accept that bad reviewing happens and want to address the issue, we are left with a very difficult problem. Many smart people have thought about improving this process, yielding the system we observe now. There are many subtle issues here and several solutions that (naively) appear obvious don’t work.