as of last night, late.
When the reviewing deadline passed Wednesday night 15% of reviews were still missing, much higher than I expected. Between late reviews coming in, ACs working overtime through the weekend, and people willing to help in the pinch another ~390 reviews came in, reducing the missing mass to 0.2%. Nailing that last bit and a similar quantity of papers with uniformly low confidence reviews is what remains to be done in terms of basic reviews. We are trying to make all of those happen this week so authors have some chance to respond.
I was surprised by the quantity of late reviews, and I think that’s an area where ICML needs to improve in future years. Good reviews are not done in a rush—they are done by setting aside time (like an afternoon), and carefully reading the paper while thinking about implications. Many reviewers do this well but a significant minority aren’t good at scheduling their personal time. In this situation there are several ways to fail:
- Give early warning and bail.
- Give no warning and finish not-too-late.
- Give no warning and don’t finish.
The worst failure mode by far is the last one for Program Chairs and Area Chairs, because they must catch and fix all the failures at the last minute. I expect the second failure mode also impacts the quality of reviews because high speed reviewing of a deep paper often doesn’t work. This issue is one of community norms which can only be adjusted slowly. To do this, we’re going to pass a flake list for failure mode 3 to future program chairs who will hopefully further encourage people to schedule time well and review carefully.
If my experience is any guide, plenty of authors will feel disappointed by the reviews. Part of this is simply because it’s the first time the authors have had contact with people not biased towards agreeing with them, as almost all friends are. Part of this is the significant hurdle of communicating technical new things well. Part may be too-hasty reviews, as discussed above. And part of it may be that the authors simply are far more expert in their subject than reviewers.
In author responses, my personal tendency is to be blunter than most people when reviewers make errors. Perhaps “kind but clear” is a good viewpoint. You should be sympathetic to reviewers who have voluntarily put significant time into reviewing your paper, but you should also use the channel to communicate real information. Remotivating your paper almost never works, so concentrate on getting across errors in understanding by reviewers or answer their direct questions.
We did not include reviewer scores in author feedback, although we do plan to include them when the decision is made. Scores should not be regarded as final by any party, since author feedback and discussion can significantly alter a reviewer’s understanding of the paper. Encouraging reviewers to incorporate this additional information well before settling on a final score is one of my goals.
We did allow resubmission of the paper with the author response, similar to what Geoff Gordon did as program chair for AIStat. This solves two problems: It helps authors create a more polished draft, and it avoids forcing an overly constrained channel in the communication. If an equation has a bug, you can write it out bug free in mathematical notation rather than trying to describe by reference how to alter the equation in author response.
Please comment if you have further thoughts.
> We did allow resubmission of the paper with the author response, similar to what Geoff Gordon did as program chair for AIStat.
Personally, as a past AIStats area chair, I got frustrated by that system, and got some grief from reviewers about it too. I agree that it’s good to get authors to polish their drafts, and I’ll be interested to see how it works this time. But I worry that the equilibrium state is that the first submission quality falls, as it can “always be fixed in the response period”. Sending something to reviewers should be done *after* some decent level of polishing, out of respect for their time.
Something I like about the arXiv system is that people have to put their name on work that they can’t pull off the web. Then they really have to mean it when they submit.
The same criticism applies to author response in general. Overall, I consider author response healthier for the community than not, because as a mechanism is acknowledges that reviewers can make mistakes and gives them an opportunity to learn from them.
An author slacking in the initial submission will always substantially increases the odds of rejection. Reviewers are not expected to do another multihour careful read of the draft, so a revised submission is in the same category as supplementary material: something that diligent reviewers can use as a resource to answer lingering doubts.
The size of your PC and the number of submissions seems incompatible (at a first estimation) with the idea of an average reviewer having an afternoon set aside for each paper (of course I might be wrong). I am not sure how this can be addressed other than increasing the pool of reviewers, but maybe this explains to some extent the phenomenon of last minute reviews.
I suspect you are confusing the area chairs with the program committee. The PC is size ~400 on the website, and we have added about 70 more during the review process. The typical maximum load is 7 papers, which seems reasonable over a 3.5 week reviewing period.
I am not sure if the critical point is whether we should resubmit or not the whole paper, but I am intrigued with the more basic question of whether the reviewers and area chairs are really reading the feedback provided by the authors.
I acknowledge that as a community we do not have any other system to select which paper to publish, but as a newcomer in the field (I have been submitting and reviewing papers just for the last 3 years for the major conferences), the possibility of providing feedback seems atractive but at the same time misleading (perhaps unfair).
At a first sight, it seems a great idea to let the authors pinpoint failures of the reviewers and clarify misunderstandings, but in practice, we just say that this is possible, but in almost all papers I never saw nothing changing… the process seems pretty much the same as if no feedback was provided. The only case that I see changing is when just the PC chair is sensitive and decide to read, but it seems that the area chairs area just blindly endorsing what the reviewers are saying. This is pretty much hidden from someone that is not senior since all the discussions and so area not public available.
You have more experience on this issue, do you think reviewers and area chairs are really feedback, or this is just one more burden on their busy schedule and they basically skip this part?
I am not sure if the critical point is whether we should resubmit or not the whole paper, but I am intrigued with the more basic question of whether the reviewers and area chairs are really reading the feedback provided by the authors.
I acknowledge that as a community we do not have any other system to select which papers to publish, but as a newcomer in the field (I have been submitting and reviewing papers just for the last 3 years for the major conferences), the possibility of providing feedback seems atractive but at the same time misleading (perhaps unfair).
At a first sight, it seems a great idea to let the authors pinpoint failures of the reviewers and clarify misunderstandings, but in practice, we just say that this is possible, but in almost all papers I never saw nothing changing… the process seems pretty much the same as if no feedback was provided.
The only case that I witnessed some change after feedback phase was when the PC chair decided to read the case and change the final decision. For someone from outside the club, it seems that the area chairs area blindly endorsing whatever the reviewers are saying (even with low confidence). What makes things worse is that everything is pretty much hidden from someone that is not senior since all the discussions and so are not publicly available.
You have more experience on this issue, do you think the reviewers and area chairs are reading the feedback, or this is just one more burden on their busy schedule and they basically skip this part?
I know that the reviewer ‘modus operandi’ is not complicated, they basically see if the paper is close enough with its area. If so, it reads and can give a good review. If not, it somehow reads, give as much feedback as possible, and put their confidence as low.
What is the attitude (procedure) of an area chair when receiving the reviews?
In order for author response to alter a reviewer’s impression, you need to provide significant new information (a correction or resolving an ambiguity) and it needs to be a reviewer who is receptive to reading it. Given the way people are, the first impression on a paper tends to be the lasting impression. I would estimate perhaps 10% of decisions are directly altered by author response.
However, I believe that greatly underestimates the value of author response.
(1) Reviewers knowing that authors can respond and point out that their review is meaningless do not behave in the same way as reviewers without author response. Dismissive (often wrong) reviews are far less acceptable.
(2) While many reviewers may not have their mind changed by an author response in the short term, it can make a difference in the long term.
(3) Only a small fraction of papers have a long term impact, so if that happens to align with the set where author response makes a difference, that can have a bigger impact on the field than ‘10%’ might suggest.
An area chair picked the reviewers, so there is a natural presumption that reviewers are correct in their assessment. But, this is far from overhwhelming—an area chair takes everything into account, particularly when reviewers disagree. They may also be less biased by a mistaken first impression as they don’t have one.
One frustrating aspect of the review system is reviewers who reject papers because they have found changes to a paper that could potentially “make it stronger”. Every paper that was ever published could have been “made stronger”. The reviewers should just be candid and say, “is this submission today helping the field more than another set of papers that we are accepting” and “does delaying this paper by a year or two in order to make it stronger really have a net positive effect”.
I think everyone would agree that a paper that is “good enough” should be publishable whether or not it is further improvable, because research is basically never completely done. But, I think you should be happy about getting more than the bit “above bar or not”—it’s far more helpful to have some constructive criticism.
JL,
You have clearly been worrying about the quality of reviews, and I just wanted to let you know we got great reviews at ICML this year. All the criticisms raised by the reviewers were valid, even the one reviewer who didn’t seem fully familiar with the field raised interesting issues, and spending the last 2-3 days to respond to them has improved our paper. (Clearly, responding to the reviews has me behind in my blog-reading, since I just saw this post after submitting our rebuttal).
Thanks for all the hard work you’ve put in!
As a PC member, I read every author feedback. I think there is a another, somewhat underrated, benefit of having author feedback. Even though author feedback typically doesn’t change the reviewers’ recommendations, it can often highlight the authors’ misunderstanding of the review itself. A typical interaction would look something like:
— Reviewer writes review (perhaps hastily).
— Author rebuts review, claiming reviewer misunderstood paper.
— After reading rebuttal, reviewer realizes author misunderstood original review, and revises it to be clearer (and thus more constructive).
Since most papers are (rightfully) rejected and many reviews are hastily written (and I don’t think hastily written necessarily means the review is incorrect), I suspect that authors misunderstanding reviews is actually a relatively common scenario. It would also be interesting to measure how often reviewers revise their reviews to be clearer.
It seems you’re saying that one of the outcomes of the rebuttal process is to improve the quality of the reviews post-rebuttal. Maybe so, but that isn’t and shouldn’t be the goal.
” it can often highlight the authors’ misunderstanding of the review itself”
It is weird to put the responsibility of misunderstanding on the shoulders of the author when the review is unclear, don’t you think?
“I don’t think hastily written necessarily means the review is incorrect”
A hastily written review might render the whole process of rebuttal useless since the author is potentially loosing the opportunity to truly answer.
“It is weird to put the responsibility of misunderstanding on the shoulders of the author when the review is unclear, don’t you think?”
Misunderstanding itself doesn’t imply who should be responsible for it. But fortunately the process allows it to be fixed.
I agree that this isn’t ideal. But just as authors can often write papers in a way that can be easily misinterpreted, so can reviewers often write reviews in a way that can be easily misinterpreted.
During the BIDDING process, many reviewers lightly review many abstracts. Do you think it’d be helpful to the “winning” reviewers and/or the authors if, during this bidding period, there were a blank next to each paper where comments could be posted. E.g. related publication, or an issue to watch out for (abstract sounds like they only tested their idea on a single dataset). There are many more eyeballs at this phase. Just a thought.
Also, as a reviewer, I see papers that I think are bad fits for the conference during BIDDING. Would it help to mark these during BIDDING with a comment?
It’s something to experiment with. It would need to be lightweight because you need to make sure that the overall load on reviewers either stays the same or decreases.
Compared with confmaster, the Mircrosoft CMT has an important drawback: authors can’t see the rating/score at the time of rebuttal/feedback. Is it by purpose?
That’s Joelle & I’s design decision. In fact, I’d prefer to go further and not have reviewers not enter a score until after author feedback arrives and there is a discussion.