It’s reviewing season right now, so I thought I would list (at a high level) the sorts of problems which I see in papers. Hopefully, this will help us all write better papers.
The following flaws are fatal to any paper:
- Incorrect theorem or lemma statements A typo might be “ok”, if it can be understood. Any theorem or lemma which indicates an incorrect understanding of reality must be rejected. Not doing so would severely harm the integrity of the conference. A paper rejected for this reason must be fixed.
- Lack of Understanding If a paper is understood by none of the (typically 3) reviewers then it must be rejected for the same reason. This is more controversial than it sounds because there are some people who maximize paper complexity in the hope of impressing the reviewer. The tactic sometimes succeeds with some reviewers (but not with me).
As a reviewer, I sometimes get lost for stupid reasons. This is why an anonymized communication channel with the author can be very helpful.
- Bad idea Rarely, a paper comes along with an obviously bad idea. These also must be rejected for the integrity of science
The following flaws have a strong negative impact on my opinion of the paper.
- Kneecapping the Giants. “Kneecapping the giants” papers take a previously published idea, cripple it, and then come up with an improvement on the crippled version. This often looks great experimentally, but is unconvincing because it does not improve on the state of the art.
- Only Toys. The paper emphasizes experimental evidence on datasets specially created to show the good performance of their algorithm. Unfortunately, because learning is worst-case-impossible, I have little trust that performing well on a toy dataset implies good performance on real-world datasets.
My actual standard for reviewing is quite low, and I’m happy to approve of incremental improvements. Unfortunately, even that standard is such that I suggest rejection on most reviewed papers.
In the quest to understand what good reviewing is, perhaps it’s worthwhile to think about what good research is. One way to think about good research is in terms of a producer/consumer model.
In the producer/consumer model of research, for any element of research there are producers (authors and coauthors of papers, for example) and consumers (people who use the papers to make new papers or code solving problems). An produced bit of research is judged as “good” if it is used by many consumers. There are two basic questions which immediately arise:
- Is this a good model of research?
- Are there alternatives?
The producer/consumer model has some difficulties which can be (partially) addressed.
- Disconnect. A group of people doing research on some subject may become disconnected from the rest of the world. Each person uses the research of other people in the group so it appears good research is being done, but the group has no impact on the rest of the world. One way to detect this is by looking at the consumers2 (the consumers of the consumers) and higher order powers. If the set doesn’t expand much with higher order powers, then there is a disconnect.
- Latency. It is extraordinarily difficult to determine in advance whether a piece of research will have many consumers. A particular piece of work may be useful only after a very long period of time. This difficulty is particularly severe for theoretical research.
- Self-fulfillment To some extent, interesting research by this definition is simply research presented to the largest possible audience. The odds that someone will build on the research are simply larger when it is presented to a larger audience. Some portion of this effect is “ok”—certainly attempting to educate other people is a good idea. But in judging the value of a piece of research, discounting by the vigor with which it is presented may be healthy for the system. (In effect, this as a bias against spamming.)
If we accept the difficulties of the producer consumer model, then good reviewing becomes a problem of predicting what research will have a large impact in terms of the numbers of consumers (and consumers^2, etc…) Citations can act (to some extent) as a proxy for consumption implying that it may be possible to (retroactively) score a reviewer’s judgement. There are many difficulties here. For example a citation of the form “[joe blow 93] is wrong and here’s why” isn’t an example of the sort of use we want to encourage. Another important effect is that a reviewer who rejects a paper biases the number of citations a paper later recieves. Another is that a rejected paper that has been resubmitted to another place may change so that it is simply a better paper. It isn’t obvious what a good method is for taking all of these effects into account.
Clearly, there are problems with this model for judging research (and at the second order, judgements of reviews of research). However, I am not aware of any other abstract model for “good research” which is even this good. If you know one, please comment.
If we accept that bad reviewing often occurs and want to fix it, the question is “how”?
Reviewing is done by paper writers just like yourself, so a good proxy for this question is asking “How can I be a better reviewer?” Here are a few things I’ve learned by trial (and error), as a paper writer, and as a reviewer.
- The secret ingredient is careful thought. There is no good substitution for a deep and careful understanding.
- Avoid reviewing papers that you feel competitive about. You almost certainly will be asked to review papers that feel competitive if you work on subjects of common interest. But, the feeling of competition can easily lead to bad judgement.
- If you feel biased for some other reason, then you should avoid reviewing. For example…
- Feeling angry or threatened by a paper is a form of bias. See above.
- Double blind yourself (avoid looking at the name even in a single-blind situation). The significant effect of a name you recognize is making you pay close attention to a paper. Since not paying enough attention is a standard failure mode of reviewers, a name you recognize is inevitably unfair to authors you do not recognize.
- Don’t review fast. For conferences there is a tendency to review papers right at the deadline. This tendency can easily result in misjudgements because you do not have the opportunity to really understand what a paper is saying.
- Don’t review too much. “Too much” is defined on a per-person basis. If you don’t have time to really understand the papers that you review, then you should say “no” to review requests.
- Overconfidence is the enemy of truth. If you are not confident about your review, you should not state that you are. Bad reviews are often overconfident reviews.
- Always try to make review comments nonpersonal and constructive, especially in a rejection.
Given the above, observations, a few suggestions for improved review organization can be derived.
- Double blind. A common argument against double blind reviewing is that it is often defeatable. This is correct and misses the point. The reason why double blind reviewing is helpful is that a typical reviewer who wants to review well is aided by the elimination of side information which should not effect the acceptance of a paper. (ICML and AAAI are double blind this year.) Another reason why double blind reviewing is “right”, is that it simply appears fairer. This makes it easier on average for authors to take rejections in a more constructive manner.
- Staggered deadlines. Many people can’t prioritize reviews well, so the prioritization defaults to deadline proximity. Consequently, instead of having many paper reviews due on one day, having them due at the rate of one-per-day (or an even slower rate) may be helpful. These should be real deadlines in the sense that “you get it in by this date or you are excluded from conversation and decision making about the paper”.
- Large PCs. There is a tendency to value (and romanticize) the great researcher. But a great researcher with many papers to review can only be a mediocre reviewer due to lack of available attention and time. Consequently, increasing the size of the PC may be helpful for small PC conferences.
- Communication channels. A typical issue in reviewing a paper is that some detail is unintentionally (and accidentally) unclear. In this case, being able to communicate with the authors is helpful. This communication can be easily setup to respect the double blind guarantee by routing through the conference site. This communication does not change the meaning of a reviewers job. ICML and AAAI are allowing author feedback. I mean something more spontaneous, but this is a step in that direction.
- Refusal. In many cases, it is not possible to tell that you have a conflict of interest in a paper until after seeing it. A mechanism for saying “I have a conflict of interest, please reassign the paper” should exist, and it’s use should be respected.
- Independence. Access to other reviews should not be available until after completing your own review. The point of having multiple reviews is reducing noise. Allowing early access to other reviews increases noise by decreasing independence amongst reviewers. Many conferences (but not all) follow this pattern.
If you have more ideas, please add them.
This is a difficult subject to talk about for many reasons, but a discussion may be helpful.
Bad reviewing is a problem in academia. The first step in understanding this is admitting to the problem, so here is a short list of examples of bad reviewing.
- Reviewer disbelieves theorem proof (ICML), or disbelieve theorem with a trivially false counterexample. (COLT)
- Reviewer internally swaps quantifiers in a theorem, concludes it has been done before and is trivial. (NIPS)
- Reviewer believes a technique will not work despite experimental validation. (COLT)
- Reviewers fail to notice flaw in theorem statement (CRYPTO).
- Reviewer erroneously claims that it has been done before (NIPS, SODA, JMLR)—(complete with references!)
- Reviewer inverts the message of a paper and concludes it says nothing important. (NIPS*2)
- Reviewer fails to distinguish between a DAG and a tree (SODA).
- Reviewer is enthusiastic about paper but clearly does not understand (ICML).
- Reviewer erroneously believe that the “birthday paradox” is relevant (CCS).
The above is only for cases where there was sufficient reviewer comments to actually understand reviewer failure modes. Many reviewers fail to leave sufficient comments and it’s easy to imagine they commit similar mistakes.
Bad reviewing should be clearly distinguished from rejections—note that some of the above examples are actually accepts.
The standard psychological reaction to any rejected paper is trying to find fault with the reviewers. You, as a paper writer, have invested significant work (weeks? months? years?) in the process of creating a paper, so it is extremely difficult to step back and read the reviews objectively. One distinguishing characteristic of a bad review from a rejection is that it bothers you years later.
If we accept that bad reviewing happens and want to address the issue, we are left with a very difficult problem. Many smart people have thought about improving this process, yielding the system we observe now. There are many subtle issues here and several solutions that (naively) appear obvious don’t work.