Machine Learning (Theory)

6/29/2013

The Benefits of Double-Blind Review

This post is a (near) transcript of a talk that I gave at the ICML 2013 Workshop on Peer Review and Publishing Models. Although there’s a PDF available on my website, I’ve chosen to post a slightly modified version here as well in order to better facilitate discussion.

Disclaimers and Context

I want to start with a couple of disclaimers and some context.

First, I want to point out that although I’ve read a lot about double-blind review, this isn’t my research area and the research discussed in this post is not my own. As a result, I probably can’t answer super detailed questions about these studies.

I also want to note that I’m not opposed to open peer review — I was a free and open source software developer for over ten years and I care a great deal about openness and transparency. Rather, my motivation in writing this post is simply to create awareness of and to initiate discussion about the benefits of double-blind review.

Lastly, and most importantly, I think it’s essential to acknowledge that there’s a lot of research on double-blind review out there. Not all of this research is in agreement, in part because it’s hard to control for all the variables involved and in part because most studies involve a single journal or discipline. And, because these studies arise from different disciplines, they can be difficult to
track down — to my knowledge at least, there’s no “Journal of Double-Blind Review Research.” These factors make for a hard landscape to navigate. My goal here is therefore to draw your attention to some of the key benefits of double-blind review so that we don’t lose sight of them when considering alternative reviewing models.

How Blind Is It?

The primary motivation behind double-blind peer review — in which the identities of a paper’s authors and reviewers are concealed from each other — is to eliminate bias in the reviewing process by preventing factors other than scientific quality from influencing the perceived merit of the work under review. At this point in time, double-blind review is the de facto standard for machine learning conferences.

Before I discuss the benefits of double-blind review, however, I’d like to address one of its most commonly heard criticisms: “But it’s possible to infer author identity from content!” — i.e., that double-blind review isn’t really blind, so therefore there’s no point in implementing it. It turns out that there’s some truth to this statement, but there’s also a lot of untruth too. There are several studies that directly test this assertion by asking reviewers whether authors or institutions are identifiable and, if so, to record their identities and describe the clues that led to their identification.

The results are pretty interesting: when asked to guess the identities of authors or institutions, reviewers are correct only 25–42% of the time [1]. The most common identification clues are self-referencing and authors’ initials or institution identities in the manuscript, followed by reviewers’ personal knowledge [2, 3]. Furthermore, higher identification percentages correspond to journals in which papers are required to explicitly state the source of the data being studied [2]. This indicates that journals, not just authors, bear some responsibility for the degree of identification clues present and can therefore influence the extent to which review is truly double-blind.

Is It Necessary?

Another commonly heard criticism of double-blind review is “But I’m not biased!” — i.e., that double-blind review isn’t needed because factors other than scientific quality do not affect reviewers’ opinions anyway. It’s this statement that I’ll mostly be focusing on here. There are many studies that address this assertion by testing the extent to which peer review can be biased against new ideas, women, junior researchers, and researchers from less prestigious universities or countries other than the US. In the remainder of this post, I’m therefore going give a brief overview of these studies’ findings. But before I do that, I want to talk a bit more about bias.

Implicit Bias

I think it’s important to talk about bias because I want to make it very clear that the kind of bias I’m talking about is NOT necessarily ill-intentioned, explicit, or even conscious. To quote the AAUW’s report [4] on the under-representation of women in science, “Even individuals who consciously refute gender and science stereotypes can still hold that belief at an unconscious level. These unconscious beliefs or implicit biases may be more powerful than explicitly held beliefs and values simply because we are not aware of them.” Chapters 8 and 9 of this report provide a really great overview of recent research on implicit bias and negative stereotypes in the workplace. I highly recommend reading them — and the rest of the report for that matter — but for the purpose of this post, it’s sufficient to remember that “Less-conscious beliefs underlying negative stereotypes continue to influence assumptions about people and behavior. [Even] good people end up unintentionally making decisions that violate [...] their own sense of what’s correct [and] what’s good.”

Prestige and Familiarity

Perhaps the most well studied form of bias is the “Matthew effect,” originally introduced by Robert Merton in 1968 [5]. This term refers to the “rich-get-richer” phenomenon whereby well known, eminent researchers get more credit for their contributions than unknown researchers. Since 1968, there’s been a considerable amount of follow-on research investigating the extent to which the Matthew effect exists in science. In the context of peer review, reviewers may be more likely to recommend acceptance of incomplete or inferior papers if they are authored by more prestigious researchers.

Country of Origin

It’s also important to consider country of origin and international bias. There’s research [6] showing that reviewers from within the United States and reviewers from outside the United States evaluate US papers more favorably, with US reviewers showing a stronger preference for US papers than non-US reviewers. In contrast, US and non-US reviewers behaved near identically for non-US papers.

Gender

One of the most widely discussed pieces of recent work on double-blind review and gender is that of Budden et al. [1], whose research demonstrated that following the introduction of double-blind review by the journal Behavioral Ecology, there was a significant increase in papers authored by women. This pattern was not observed in a similar journal that instead reveals author information to reviewers. Although there’s been some controversy surrounding this work [7], mostly questioning whether the observed increase was indeed to do with the policy change or a more widely observed phenomenon, the original authors reanalyzed their data and again found that double-blind review favors increased representation of female authors [8].

Race

Race has also been demonstrated to influence reviewers’ recommendations, albeit in the context of grant funding rather than publications. Even after controlling for factors such as educational background, country of origin, training, previous research awards, publication record, and employer characteristics, African-American applicants for National Institutes of Health R01 grants are 10% less likely than white applicants to be awarded research funding [9].

Stereotype Threat

I also want to talk briefly about stereotype threat. Stereotype threat is a phenomenon in which performance in academic contexts can be harmed by the awareness that one’s behavior might be viewed through the lens of a negative stereotype about one’s social group [10]. For example, studies have demonstrated that African-American students enrolled in college and female students enrolled in math and science courses score much lower on tests when they are reminded beforehand of their race or gender [10, 11]. In the case of female science students, simply having a larger ratio of men to women present in the testing situation can lower women’s test scores [4]. Several factors may contribute to this decreased performance, including the anxiety, reduced attention, and self-consciousness associated with worrying about whether or not one is confirming the stereotype. One idea that that hasn’t yet been explored in the context of peer review, but might be worth investigating, is whether requiring authors to reveal their identities during peer review induces a stereotype threat scenario.

Reviewers’ Identities

Lastly, I want to mention the identification of reviewers. Although there’s much less research on this side of the equation, it’s definitely worth considering the effects of revealing reviewer identities as well — especially for more junior reviewers. To quote Mainguy et al.’s article [12] in PLoS Biology, “Reviewers, and especially newcomers, may feel pressured into accepting a mediocre paper from a more established lab in fear of future reprisals.”

Summary

I want to conclude by reminding you that my goal in writing this post was to create awareness about the benefits of double-blind review. There’s a great deal of research on double-blind review and although it can be a hard landscape to navigate — in part because there are many factors involved, not all of which can be trivially controlled in experimental conditions — there are studies out there that demonstrate concrete benefits of double-blind review. Perhaps more importantly though, double-blind review promotes the PERCEPTION of fairness. To again quote Mainguy et al., “[Double-blind review] bears symbolic power that will go a long way to quell fears and frustrations, thereby generating a better perception of fairness and equality in global scientific funding and publishing.”

References

[1] Budden, Tregenza, Aarssen, Koricheva, Leimu, Lortie. “Double-blind review favours increased representation of female authors.” 2008.

[2] Yankauer. “How blind is blind review?” 1991.

[3] Katz, Proto, Olmsted. “Incidence and nature of unblinding by authors: our experience at two radiology journals with double-blinded peer review policies.” 2002.

[4] Hill, Corbett, St, Rose. “Why so few? Women in science, technology, engineering, and mathematics.” 2010.

[5] Merton. “The Matthew effect in science.” 1968.

[6] Link. “US and non-US submissions: an analysis of reviewer bias.” 1998.

[7] Webb, O’Hara, Freckleton. “Does double-blind review benefit female authors?” 2008.

[8] Budden, Lortie, Tregenza, Aarssen, Koricheva, Leimu. “Response to Webb et al.: Double-blind review: accept with minor revisions.” 2008.

[9] Ginther, Schaffer, Schnell, Masimore, Liu, Haak, Kington. “Race, ethnicity, and NIH research awards.” 2011.

[10] Steele, Aronson. “Stereotype threat and the intellectual test performance of African Americans.” 1995.

[11] Dar-Nimrod, Heine. “Exposure to scientific theories affects women’s math performance.” 2006,

[12] Mainguy, Motamedi, Mietchen. “Peer review—the newcomers’ perspective.” 2005.

17 Comments to “The Benefits of Double-Blind Review”
  1. Anonymous says:

    Sorry, but this is obviously wrong. Double-blind reviewing has been the policy in all major machine learning conferences for many years, and I am yet to see a single black lesbian student at either ICML or NIPS. So frustrrating.

  2. jl says:

    My impression is that the primary benefit of double blind reviewing is in the last paragraph: it creates a perception of fairness, which both invites outsiders in and creates an expectation of fairness that is implicitly understood by reviewers and authors. It also has the virtue of forcing insiders to suffer average reviewing, implying that those with some control over the review process have incentive to improve it.

  3. Tomas says:

    If the work is a follow-up, it is more the rule than the exception that authors self-identify. I try to be conscientious with understanding the paper and evaluating the novelty, which is so often the sticking point. So it isn’t unusual for me to read two or more of the cited papers to get both the understanding and a sense of the delta. And then you almost inevitably find that paragraph, turn of phrase, or figure that is identical in wording or peculiarities of formatting to the one in the paper being reviewed…

    It depends on the size of the community, too – happens more often to me in UAI than in ICML that it’s obvious who wrote the paper.

  4. George says:

    There is another problem with reviewing scientific papers. As a very depressing factor, I remember well my first attempts to publish in an English speaking journal a paper written in a language not satisfying “high academic standards”. Being a scientist in a 3rd country wasn’t depressing enough. I was wondering why nobody could proof read a math paper for me and fix some trivial language mistakes in the editorial board or something. The problem was solved by my supervisor (who was a bit more known to the community) putting his name on it, and that caused a reviewer (who accidentally happened to be an old friend) to kindly correct the grammar and send it separately-secretly to my supervisor with errors marked in red. I don’t know how I could publish it otherwise…

  5. Aaron says:

    I think people tend to be very over-confident in their ability to identify paper authors. Everyone thinks they can, but we’re frequently wrong.

  6. Anonymous says:

    A “perception of fairness”? How rotten.

    • jl says:

      Anyone who wants a real discussion should explain their position rather than sniping like this. I’ll delete further trollings of a similar sort.

  7. Yisong Yue says:

    Thanks for the great discussion points! I wonder what the possible negative impacts double-blind reviewing might have? From just reading this post, one might conclude that the “obvious” thing to do is to implement double-blind reviewing, since all the points raised are more-or-less in favor (or neutral) towards double-blind reviewing.

    (I’m personally in favor of double-blind reviewing, but I’d love to hear thoughts that might hint to a superior alternative.)

    • jl says:

      Off hand, the systemic disadvantages of double blind are:

      (1) Extra work for authors (to anonymize papers) and program chairs (to secure the anonymization).

      (2) Some demotivation by reviewers—as a reviewer you will generally be more motivated to read closely when you know the author. This is a flipside of unfairness.

      (3) Incompatibility with arxiv.org which requires public authors.

    • Nikos says:

      In addition to what John said, I remember this post by Daniel Lemire on the disadvantages of double blind: http://lemire.me/blog/archives/2011/04/28/the-case-against-double-blind-peer-review/

      I don’t fully agree with all the points, but I do find that double blind reviews can be harsher.

  8. Brian says:

    Just curious, what are your opinions about the complete opposite of the double blind review? A model where the the identities of both the authors and reviewers and their comments/responses are public. You mention you’re not against open review, but being a seasoned reviewer yourself what are you opinions about that model?

    Perhaps reviewers can remain anonymous as an option, but I think its very useful for the rest of the community to have the reviews publicly available. The community can assess the quality of the review. With the amount of time and effort it takes to do reviewing, it seems a bit of a waste to not have these contributions available. To some extent, the reviewers are also contributors and deserve some credit for the work they review.

    • jl says:

      Many things are tied together here: Reviewer Identity, Review Content, Author Identity, and “Anyone comments” styles of review.

      A young graduate student could learn quite a bit from reading existing reviews. Similarly, reviewers might pay more attention to detail if their review is going public. And, of course, it’s always possible that the wrong reviewers are chosen, so giving random people the option to contribute has some real value. So, having all reviewing private seems like quite a waste. Making Review Content public seems pretty worthwhile.

      “Anyone comments” also has a potential downside, because an adversarial reviewer could FUD most results they don’t like. It is not hard—no paper addresses all problems with good theory and experiments. I’ve seen this be a problem even within the existing system due to the ability of reviewers to bid for papers they want to torpedo. Given this, my impression is that “anyone comments” reviews should be paired with a revealed reviewer identity and be considered w.r.t. soundness but not as necessarily representative of interest level.

      Author identity is something which I generally believe should not be revealed during reviewing, as it clearly does have an uncontrolled biasing effect. I read through the post Nikos pointed out but I found it uncompelling. Why should an author who is working on a sequence of papers have a special right to continue that sequence? And why can’t double blind papers cite and discuss appropriate context? I see no reason for either. But, revealing author identity after reviewing for rejected papers has some virtues. It may cause authors to polish their drafts more before submission, and it may reveal systematic review failures and/or systematic misuse of reviews, both of which simply cannot be diagnosed except by personal anecdote right now.

      Reviewer identity is the most delicate aspect, because there are extreme and relevant asymmetries in power. It is _likely_ that a good postdoc candidate for a lab will review papers from that lab as a senior graduate student. If reviewer identities are public, it is hard to imagine the typical good graduate student not shading their review in favor of a potential employer. Giving credit for reviewing is a real value, but doing this well requires that the only the review identities of senior people are revealed, or there is something like a 5 year delay on revealing reviewer identity in general.

      • tf says:

        How about this modification of the double-blind system:

        1) Reviewers and authors are unknown (exactly as in the double-blind)
        2) Once the decision about accepting or rejecting the paper is made, then the paper (accepted or not), the review, and the names of the reviewers and authors are made public.

        In my point of view, it keeps all the good things of double-blind and add some extra robustness, for instance:
        * Reviewers will do a better job because their name and reviews will be public,
        * Authors will not send incomplete (half-done) papers just to get some ‘feedback’ (and still have some odds of the paper being accepted)

        I can see people going through the publicly available submissions and reviews to find the worst reviewers and drafts. Just image the blog “Worst Reviewers in ML”. And since the data would be public available, everyone could check it out to see if the blogger overreacted or if we reviewer was plain bad.

        In fact, it’s not something completely new. Several countries have (semi) “transparent governments” where anyone can go in a website and check the money destination in the last years. Brazil (my home country) is one example and some parts of the government have this ‘transparency portals’ (e.g., http://www.portaltransparencia.gov.br). The amount of data is overwhelming, but still there is always journalists finding corruption evidences right there.

        I guess it would be similar in the system I described for papers review and the exception is who would check the data to find “bad” reviewers/authors. Instead of journalists, it would be ourselves (researchers) because we saw someone doing an awful job and they need to be exposed.

        • jl says:

          I’m hesitant to go with public reviewer names in a short time frame due to the power asymmetry issue. I expect it would be quite difficult for a senior graduate student to give a negative review to a paper from a lab where they want a postdoc.

  9. Rahul says:

    Hey….I dont completely agree with double blind reviews…..Y not make it tripple blind? Even the Editor who is selecting the reviewers does not come to know who the authors are?

    If this author is Editor’s friend or enemy, the Editor will send papers to such reviewers which either will be very kind ones or very harsh…

    Only if it is tripple blind, only then I will really accept this method is purely based on science….

    Otherwise, all is crap!!!!!

  10. Chris W says:

    Has anybody actually crunched the numbers to see if there is any sign of major change in conditional paper acceptance probability as a result of introducing double-blind reviewing to ML conferences? For example, are a greater fraction of those papers with female first author accepted than previously?

    I couldn’t immediately find this on Google — anyone know if it has been done?

    • jl says:

      Not as far as I know. It would be somewhat laborious, because author sex would need to be derived.

Sorry, the comment form is closed at this time.

Powered by WordPress