Yesterday, there was a discussion about future publication models at NIPS. Yann and Zoubin have specific detailed proposals which I’ll add links to when I get them (Yann’s proposal and Zoubin’s proposal).
What struck me about the discussion is that there are many simultaneous concerns as well as many simultaneous proposals, which makes it difficult to keep all the distinctions straight in a verbal conversation. It also seemed like people were serious enough about this that we may see some real movement. Certainly, my personal experience motivates that as I’ve posted many times about the substantial flaws in our review process, including some very poor personal experiences.
Concerns include the following:
- (Several) Reviewers are overloaded, boosting the noise in decision making.
- (Yann) A new system should run with as little built-in delay and friction to the process of research as possible.
- (Hanna Wallach(updated)) Double-blind review is particularly important for people who are unknown or from an unknown institution.
- (Several) But, it’s bad to take double blind so seriously as to disallow publishing on arxiv or personal webpages.
- (Yann) And double-blind is bad when it prevents publishing for substantial periods of time. Apparently, this comes up in CVPR.
- (Zoubin) Any new system should appear to outsiders as if it’s the old system, or a journal, because it’s already hard enough to justify CS tenure cases to other disciplines.
- (Fernando) There shouldn’t be a big change with a complex bureaucracy, but rather a smaller changes which are obviously useful or at least worth experimenting with.
There were other concerns as well, but these are the ones that I remember.
Elements of proposals include:
- (Yann) Everything should go to Arxiv or an arxiv-like system first, as per physics or mathematics. This addresses (1), because it delinks dissemination from review, relieving some of the burden of reviewing. It also addresses (2) since with good authors they can immediately begin building on each other’s work. It conflicts with (3), because Arxiv does not support double-blind submission. It does not conflict if we build our own system.
- (Fernando) Create a conference coincident journal in which people can publish at any time. VLDB has apparently done this. It can be done smoothly by allowing submission in either conference deadline mode or journal mode. This proposal addresses (1) by reducing peak demand on reviewing. It also addresses (6) above.
- (Daphne) Perhaps we should have a system which only reviews papers for correctness, which is not nearly as subjective as for novelty or interestingness. This addresses (1), by eliminating some concerns for the reviewer. It is orthogonal to the double blind debate. In biology, such a journal exists (pointer updated), because delays were becoming absurd and intolerable.
- (Yann) There should be multiple publishing entities (people or groups of people) that can bless a paper as interesting. This addresses (1).
There are many other proposal elements (too many for my memory), which hopefully we’ll see in particular proposals. If other people have concrete proposals, now is probably the right time to formalize them.
I believe the journal that Daphne was referring to is PLoS ONE.
Hanna Wallach was the attendee who brought up the point about double-blind review, and quite a few others agreed.
Since this debate I’ve heard a lot of buzz around NIPS about the idea of trying out a VLDB-like system. Personally I think it would be fantastic to spread out the deadlines, both in terms of reducing reviewer burden and reducing the pressure to submit research before it’s fully polished.
As a relatively junior researcher, I’m also concerned about moving too far from the established publication model. The VLDB-like system seems like a great way to fix problems with the current system without abandoning all of the benefits.
Hi Jenn,
The system I am proposing is an attempt to remove any bias (gender-related or otherwise). When I was a grad student in France from a relatively unknown lab, I felt the bias against non-native English speaker/writers from relatively unknown labs. You are a native English speaking graduate from a top-10 US school with a well-connected and famous advisor. You actually have it easy.
Today, the bias is particularly strong against unknown junior authors from outside North America and a few European countries, who a not native English speakers, who have not yet presented at a major conference, and don’t have a well-connected advisor who can introduce her/him to the leaders of the field so as to attract attention to his/her work. The decks are heavily stacked.
An it gets worse: even when such authors actually manage to publish their work (often in venues that are not widely read in the US), they rarely get cited, particularly if another more prominent US-based author happens to have published a similar idea slightly later in a more prominent review.
An open reviewing system with no limit to dissemination as I am proposing would considerably reduce such bias. Blatant biases in open reviews would quickly be identified by readers and other reviewers. Authors who are not properly cited could signal their existense in a comment of the offending paper. The author could not avoid making a revision with the proper citation without embarassing him/herself.
Conference papers are so short and the review so quick that reviewers have to make a decision under high uncertainty. That precisely why having the author names causes such a bias in favor of reputable authors. But suppressing the author name creates other biases: the writing style (non-native speakers are easily detected), the precise set of citations (reviewers favor papers that cite them), the mention of certain bad words (e.g. “neural”, “genetic”, “Bayesian”, “frequentist”, “fuzzy”).
In my opinion, an open review system is the only way to reduce these biases and really pick out the best new ideas, regardless of whether the author can write well in English, or cites all the right important people in his first version.
Sometimes, a paper becomes very famous and highly cited, not because the idea is particularly new, but because the author is well known, writes very clearly and simply, and publishes in a major venue. A good example is John Hopfield’s seminal paper on the connection between neural nets and spin glasses which helped renew the interest in neural nets in the early 80’s (NIPS wouldn’t exist without it). The very same model had been studied years earlier by a number of people (Nakano, Amari, Little, Kohonen,….), but no-one paid attention. It took a high-stature person like Hopfield, a very short and clear paper, and a publication in PNAS. Examples abound.
[Incorporated both references]
Another Journal only reviewing for correctness is Frontiers (http://frontiersin.org/).
What about some radical change:
ALL papers—journal papers, conference papers—are submitted to a preprint server. People can leave comments, questions, reviews on that server. Authors can comment on comments, questions, and reviews as well, and so on. Authors can tag preprints to be included into particular journals or conferences they would like to attend. However, also journals and conferences themselves can tag papers. Journals and conferences are then only index sets that point to papers on the preprint server that have been reviewed (on the preprint servers) by trusted members and which belong to a certain topic. So journals are selections of preprint papers which guarantee certain quality standards. Similarly conferences invite authors of interesting, refereed preprints to give talks or present posters. It is directly obvious in what journals and conferences a work has been listed or presented. Everything is fully transparent (double non-blind). Authors are known to everybody, reviewers are known to everybody. Of course a single paper can be listed in several journals (e.g. from different communities) and presented on several conferences.
That’s pretty close to what I’m proposing (text is linked from main posting).
I just wanted to point you to a similar discussion from 2001, which some might probably still remember: http://www.hutter1.net/jpubcon.txt
Somewhere down the discussion Geoff Hinton suggests something like a trust system with people putting their publications on their home page, and you can get a view of what people who you trust consider interesting. I wondered whether it would be much easier to implement something like this with RSS and stuff like that today.
If I remember correctly, Hanna Wallach was referring to “studies” that show negative bias towards women and unknown universities/individuals when reviews are not double-blind. I am personally a bit skeptical about the significance and accuracy of these studies (would be good to post links to a couple of them for further reference). Specially when you take into account the fact that for well-known individuals, it is relatively easy to guess the identity of the authors based on the work that is being reviewed (a positive bias for well-known authors induces a negative towards others). It seems to me that double-blind is more of a theoretical concept that is deeply flawed in practice. Yet the claim that getting rid of it will cause substantial problems needs more elaborate discussions.
It would be great to look at some of these studies and see if they have any suggestions to get around such bias (say, by balancing the number of male/female reviewers, etc). Specially it would be good to look at math and physics communities and see if they observe such problems.
My proposal would actually eliminate this kind of bias: 1: since papers are disseminated regardless of the the opinion of reviewers, there is less riding on the reviewer’s decision. 2: since reviews are readable by everyone (even of they are anonymous), it would be difficult to get away with blatantly biased reviews. 3: good papers will eventually get the recognition they deserve, unlike in the current system where you can easily get scooped because of publication delays. There are more details in the Q/A section of my proposal (at the bottom of the page).
I don’t know of any studies specifically showing gender bias in conference or journal peer review. But there are enough studies showing gender bias in related settings that at this point, the burden of proof should be on those who claim that conference/journal reviewing is immune to (often unconscious) bias. It’s conceivable, I suppose, that such reviewing is somehow different in some important way from, e.g., assessment of less specialized work [1], or reviewing of applications for postdoctoral fellowships [2]—but it doesn’t seem likely. There’s certainly enough evidence to make rampant gender bias the working assumption.
If taking gender bias seriously leads to publication models that help level the field for young or sub-“top tier” researchers, so much the better. (For those concerned about being non-native writers of English: if you’re not fluent, find a colleague who is, and have them edit your paper. That way, you’ll pass as native. And don’t assume your writing is perceived as fluent just because people seem to understand it.)
[1] “Goldberg revisited: What’s in an author’s name”. http://www.springerlink.com/content/n45426751g104902/
[2] “Nepotism and sexism in peer review.” http://www.nature.com/nature/journal/v387/n6631/full/387341a0.html
The point of having more representation of women on PCs would be to make sure that women aren’t being overlooked for such designations. It would not guarantee a more gender-blind outcome with respect to accepted papers. Women are just as harsh on other women as men are (if not more so). In other words, if the goal is a more gender-blind outcome with respect to the technical program, a more gender balanced PC would not necessarily be the way to achieve that. The only way to achieve that is to be as double-blind as possible or have both author and reviewer be non-anonymous–the idea of the latter being that if the review is not anonymous, a reviewer has more incentive to be fair, since he is also being judged on his review. It’s the one-sided, unbalanced anonymity that allows people to exercise their prejudices without being held accountable.
Also, contrary to what Mahdi says, it can often be difficult to “guess” who wrote what if it is not already public.
You certainly have good points here. But I merely included balancing male vs female as a naive example. My point was that as long as you can quantify such bias, you might be able to deal with it by various means. You might think of, say, an inverse bias imposed after the reviews to cancel out the reviewers’ bias. If the studies that Hanna was referring to are concrete with strong statistical quantities in them, they should also include suggestions as how to debias the outcome on a non-blind reviewing systems.
As for the anonymity point, I won’t expect the reviewers to be happy with that suggestion. One important drawback is of course the fact that well-known researchers are difficult to disagree with, even when a reviewer is certain about the point of disagreement. I’m not suggesting this is the case for everyone, but I’m sure it would be the case for me in such hypothetical situation.
If the reviews were purely comments/questions about correctness, then wouldn’t it be a good thing for reviewers to be non-anonymous? It would give people an incentive to read things carefully, since they would get “credit” for making informed comments.
Sure it would be great to get credit for the reviews that one writes, including ideas and suggestions that might lead into future works. This is particularly the case when the reviews themselves are citable documents (according Yann’s proposal). However, I think the choice of being anonymous or not (potentially getting credit vs risking backfire on your own credentials) should be made by the reviewers themselves. On the other hand, the decision whether or not anonymous/non-anonymous reviews should be taken seriously is a collective decision made by the community. In Yann’s “market” point of view, we should see the system converge into an equilibrium on this particular choice. I personally would expect to see harsh reviews on the quality of the work being made anonymous, while the ones on correctness and those in questions form being made along with the identity of the reviewer.
Added Yann’s proposal. It’s quite detailed so read carefully.
Added Zoubin’s proposal. (Zoubin emailed me on the 8th, but some email snafu on my end occurred.)
Zoubin adds:
At the risk of repeating myself, the “well-documented bias” concerns traditional single-blind review where the reviews are kept private, each paper only receives a small number reviews from designated reviewers, and the dissemination is actually determines by the reviews.
With my proposal, bias is considerably less likely to occur: reviews are available for all to see, each paper may be reviewed by anyone (with negative bias, positive bias, or no bias at all), and the reviews do not determine whether the paper is disseminated. Deliberate biases in some reviews will be available for all to see, which will reduce their likelihood.
Double blind reviews only reduces (partially) one kind of bias: the one due to the author’s name, affiliation, and perhaps gender. But there are many other sources of biases: whether the reviewer is cited in the references, whether the authors’ English sounds “native”, whether the author “knows how to write a paper for conference XXX”, and all kinds of things that have nothing to do with the quality of the contribution.
My proposal is an attempt to fix many more kinds of biases by opening up the process rather than by closing it even more, as double-blind review does. It will also reduce the consequences bias, if there is bias.
Dear all,
interesting debate! Here are some points I like.
People submit to some kind of arxiv (cf. Yann). There are some entities that provide scientific value judgments (e.g. Zoubin’s college, but some conferences might prefer to do their own reviewing), numeric or textual. There are occasional submission deadlines in sync with the major conferences.
Journals, conferences, workshops feed on the papers/reviews in that system, probably with different tastes. Journals prefer complete work, conferences like NIPS may trade completeness for innovation, etc.
Putting together a conference, workshop, or journal issue consists of a PC looking through the system, and offering the authors of strong papers inclusion of their paper in the event. If the authors agree, the record of the work in the system is updated (“published in …”, “presented at …”).
The authors can determine at any time who has access to their paper, its reviews etc. (e.g. only the NIPS PC, or the whole world).
A rudimentary approximation of such a system would be a combination of arxiv with a few good conferences that decide to share reviews.
One caveat: currently, an author can ‘delete reviews’ with high probability by resubmitting to a different conference/journal. That’s an inefficient use of reviewer resources, but maybe we need to retain some kind of ‘second chance’ mechanism. If we rely on everything being resolved by public debates of papers and reviews, we may be too optimistic.
I also liked some less ambitious ideas that were discussed on and off, e.g., rewarding the best reviewers by prizes or travel grants, and publishing parts of the reviews (I’d love to see that in a conference poster session).
all the best,
bernhard
Yann’s proposal seems to generally be addressing the question of where we are going, while for the others it is relatively easy to understand how the implementation could occur. I broadly agree with the motivations in Yann’s proposal based on my own experience, although we might disagree in some details.
The difficult thing seems to be breaking any process of change down into steps, each of which is obviously helpful and which in conjunction get us there. Here is my best attempt.
(1) Require author submission to Arxiv as a first step. The second step is a submission of a blinded draft+arxiv pointer for review. Any blinded draft without an Arxiv pointer is rejected.
This is a clear win because:
(a) It will mean members of the community get their work out faster.
(b) It will reduce reviewing load as low-quality papers won’t be submitted to arxiv where they are permanently attached by name to the author.
(c) It provides a mechanism for revision which in any realistic view of science is absolutely essential.
(2) Switch to a coincident journal for the conference. Submissions are on a rolling basis with a final deadline for the year. Each submission (to which (1) applies) is reviewed a standard 3 times, With all accepted papers guaranteed a spot in the conference.
This is a clear win because:
(a) It reduces peak reviewing load, allowing more thoughtful consideration of papers.
(b) It provides a more prestigious publication process.
(3) Reviews become public first-class objects subject to citation, revision, and creditation with levels of anonymity controlled by the journal & reviewer.
This is a win because:
(a) It reduces reviewer load by making previous reviews available on the submit/reject/resubmit cycle. This is an obvious win for later reviewers who can reference earlier reviews to see what their concerns are and whether they were addressed.
(b) I believe this is also a win for authors as personally I’d love to be able to reference previous reviews showing how we addressed them and where they are provably wrong. Note that initial reviews will be independent addressing the excess certainty due to correlated reviews problem.
(c) It makes the journal’s website a reference for any other conferences/journals, increasing awareness of the journal.
(4) The journal process becomes nonmonolithic. The website is altered to support approvals by various subcommunities and for various things, while retaining the original journal process.
This is a win because:
(a) It supports the construction of new research communities which is otherwise very difficult (think “have fun creating a new conference”). Done right, this will grow the overall journal and aid the development of science.
(b) It opens up the journal to other communities not presently at NIPS.
(c) It allows for the creation of a relatively unbiased baseline review like PLoS.
At the rate of change of 1/year, this would take 4 years. Naturally, this plan could be stalled or altered as experience dictates. The first step is easy—it’s just adding a small amount of information to the submission. The second is harder, but we have some experience with setting up journals. The third has to come after the second engineering wise. The third and fourth are freely swappable, but I expect the 4th to require a bit more work to setup. There are many important details I’ve left out here such as how reviewers are assigned or how long they have which I’ve left out here as they seem like relatively orthogonal subjects.
I just wanted to reiterate a comment that someone made at the debate, namely that Step 1 is not a clear win for everyone involved. In particular, it could end up really damaging the careers of young grad students. It is important for students who are getting started with research for the first time to be able to safely obtain peer reviews for their work without the threat of having these reviews associated with them for the rest of their careers.
Submitting papers to Arxiv is already an available option for people who want quick dissemination with a time stamp.
I wasn’t at the meeting, so perhaps it is obvious in context, but I don’t really understand this concern.
Submitting work for publication at an academic conference requires several very busy busy people to invest their time in organization and reviewing. If the work is known to not be appropriate or at the level of the venue, submitting it is a waste of everyone’s time and contributes to some of the problems with academia. The supervisors of these junior students should realize this and work with them on the publication and help decide whether it is appropriate to submit. The supervisor should have avenues to get external constructive feedback for their students other than using the conference/journal review process.
Of course papers will be submitted that make mistakes: misuse of statistics, not realizing a “novel” idea has been well explored many times before, etc. And of course it can be a bit embarrassing when this happens to you (or your student). But it does happen, especially early in a student’s career. I think most people realize that. However, in my personal view, it would be good for there to be incentives that discourage these mistakes from being made over and over by the same people.
It is a pet peeve of mine, not shared by everyone, when people use conferences to “see what happens” with an obviously marginal or half-baked paper. I’ve heard people say “worst thing that can happen is we get some comments on the idea”. This really seems like the wrong attitude.
I’m not quite understanding the fear. Is it that a beginning graduate student might make a mistake which a fire breathing reviewer belittles inadvertently destroying the graduate student’s career?
There are two ways this is naturally dealt with. The first is because the community of researchers should and would frown on overly brutal reviews as eating the young is surely unhealthy for any community. The second is because there is a revision process and hence a means for the graduate student to correct errors.
It might be helpful to consider the nature of double blind reviewing.
(A) Hard double blind. Huge efforts are made to ensure that the author can not be detected by the reviewer.
(B) Soft double blind. The author isn’t front and center, but the reviewer can probably figure it out or at least develop a good guess using a web search if they care to try.
(C) Single blind. The author is front and center in the review process.
Many of the complaints about double blind (that literature review is impossible, that it slows progress, etc…) seem to relate entirely to hard double blind. Soft double blind should be understood as of some value to reviewers and authors that want to avoid a mechanism of undue influence. Many conferences and journals still use single blind.
Soft double blind is what NIPS uses now, and it would be preserved under (1), although a bit softer as a search would be guaranteed to work rather than often work. I haven’t observed a strong preference for hard double blind amongst beginning graduate students and I believe no one is seriously considering implementing it (I would certainly object). I believe NIPS is still single blind at the PC (but not reviewer) level.
Did I understand the fear correctly? And if so, does it still seem significant? And if so, why?
My concern is not about single blind versus double blind in the initial review period, but about the fact that (the way I understand the proposal from 1b in your list) the rejected papers would exist with authors’ names publicly on the internet forever once the review period is over. I think that implementing a system where people’s mistakes live on to haunt them forever is going to end up discouraging creativity and risk taking in the research process, and could also potentially harm younger students.
I agree with Brian that a *good* advisor would help younger students decide what is appropriate to submit and provide them with other methods of obtaining feedback, but not everyone is lucky enough to have a good advisor starting out. Would you like it if every time people searched for your name on Google they were presented with an idea you had before you knew what you were doing accompanied by a bunch of scathing reviews? That could be really harmful to people when they are being evaluated later on, especially if these evaluations are coming from people outside the NIPS/machine learning community who aren’t familiar with this new review system.
I don’t see why this should be of any concern. A simple aging mechanism would solve this. Reviews could be (optionally) discarded after, say, 4 or 5 years. Links could be dropped, and mistakes could be forgotten.
Another option would be to have a less serious semi-reviewing section where people can essentially post half-baked ideas for the sake of discussions and in hope of useful feedbacks (maybe with a short public lifetime for each paper).
So the fear is that a young graduate student makes a mistake, which defines the graduate student forever. Step 1 doesn’t make reviews public, but step 3 does so the fear is still viable, if later.
My basic belief is that this is acceptable. The primary fear of any young graduate student interested in research positions should be not being noticed, as there are many more incoming graduate students than research positions. Having research made publicly available in a systematic way inherently helps here.
With respect to the fear, if all the graduate student does publicly is make a mistake, then it will define them for searches. On the other hand, if a young graduate student works on and publishes other papers, these will naturally dominate search results, as no one will link to or discuss a poor paper. In this case, the mistake will not define the student as any reasonable estimate of the value of a researcher is based upon the researcher’s best work rather than the researcher’s worst.
So, I believe this should be a win for a young graduate student interested in research, as the minimal fear is offset by a better system for being noticed. For a young graduate student not interested in research the fear may win, but I don’t believe we should optimize our system for this case.
I totally understand the fear (call it “virtual stage fright”), but I think it touches senior researchers at least as much as junior researchers. Young or old, we all make mistakes. We just have to live with it.
Fortunately, we are mostly judged by our contributions, not by our mistakes, unless we refuse to admit our mistakes when faced with evidence. We are a bit like concert pianists: occasional mistakes are easily forgiven if the performance is engaging (think “Glenn Gould”).
Mistakes that are properly acknowledged and corrected rarely damage a researcher’s career permanently, particularly if the author makes otherwise useful contributions.
In my opinion, a bigger problem with public reviews is that once a paper has gotten poor public reviews, it will get very difficult to publish _even if it has been significantly revised_. I can imagine that a reviewer, who gets such a revised paper for review, will be tempted to look up the reviews for previous versions of the paper, which will make their own reviews very prone to bias. This is in fact the reason that many reviewing systems such as EasyChair does not allow a reviewer to see others’ reviews before submitting their own reviews.
It’s a good point, but note that Easy Chair is used for Conference reviews that are made under pressure. As a PC member, you don’t want to enter a wildly different review from everyone else unless you can really support your point of view. Since many PC members can not support their point of view (not having read the submission carefully), the temptation would be to look and see what other people think.
If the reviews are not made under pressure and people are reviewing a paper by choice, then their is no reason to simply agree with a previously entered review. In fact, you can point out exactly where you disagree with it, if need be.
Also, since reviews are public, papers won’t be submitted over an over again to conferences for reviews. Thus the total number of reviews decreases taking pressure off of people who would otherwise (e.g. in the current system) write a review for a paper that they are not really interested in, because it requires a total of 10+ review throughout its submission lifetime …
As John pointed out, authors can explicitly address the comments made by the previous reviewers by either correcting the problem with a revision or by proving that the reviewer was wrong in some way. If a revision addresses the problems mentioned by previous reviewers, then I don’t see why one would get negative bias based on the reviews from a previous version. After all one would expect that substantial revision is an answer to bad reviews.
As for the bias of the reviews on the same version, I remember someone brought that up during the meeting at NIPS. The concern was that the first couple of reviews are going to bias the rest and a suggestion to fix this was to hide the reviews until a minimum number of of them are submitted.
Very interesting discussion. Perhaps we should take a look at the experiments with the reviewing system in other areas of science. Here is the very interesting peer review policies of the open journal Biology Direct, incorporating many elements which have been suggested here. It would be useful to get some understanding of how well this model is working for them.
http://www.biology-direct.com/info/about/
1. The Editors-in-Chief will assemble, for each subject area, a panel of potential reviewers who have agreed in advance to serve the journal and will form the Editorial Board.
2. An author who wishes to submit a research article to the journal will consult the relevant subject panel and attempt to find three appropriate Editorial Board members to peer review the article. Editorial Board members can nominate a reviewer in their place. Only reviewers directly nominated by an Editorial Board member are eligible for review.
3. The journal will insist that the initially requested reviewers are drawn from the Editorial Board.
4. In essence, an article is rejected from the journal if three Editorial Board members do not agree to review it.
5. Any reviewer-author pair (both directions) will be allowed to appear in the journal no more than four times a year.
6. Any author will be allowed to publish no more than two articles per year with the same three reviewers.
7. Reviewers are asked to undertake a two-stage review, because once they agree formally to review an article they are essentially recommending eventual acceptance and publication. The first step for a reviewer is to skim-read the article so as to allow the reviewer to form an overall opinion of the article; if they feel they cannot have their name associated with the publication of this article, they can decline to provide a formal review. But if they agree to review, the second step is for the reviewer to prepare comments for the author but also, if they wish, to prepare ‘public’ comments, however critical, that will appear alongside the final version of the article when it is published. The reviewer comments to be published can take into account any revisions to the manuscript and therefore might differ substantially from the original comments to the authors, at the reviewer’s discretion. The reviewer can also choose to publish no comments with the manuscript in which case it will be indicated, under the reviewer’s name, that “This reviewer made no comments for publication”.
8. There will be a fairly tight time frame for the review process: if an Editorial Board member does not respond to a request for review within 72 hours, this will be considered to be a ‘decline to review’ and the author will seek another reviewer. However, once an Editorial Board member agrees to review a manuscript, s/he will have 3 weeks to deliver the review. If the reviewer does not deliver comments promptly, the author will be in a position to elect to publish the manuscript accompanied by the name of the reviewer but without comments.
9. The authors will be in a position to withdraw the manuscript if they do not wish to see it published alongside the reviews that have been received. The same article may not then be submitted through other Editorial Board members.
10. As a safeguard against pseudoscience, an Editorial Board member reviewing a manuscript will have the option, in addition to writing a negative review, to alert the Editors-in-Chief that, in his/her opinion, a particular manuscript is not a legitimate scientific work and therefore should not be published in any form. The Editors-in-Chief will make the final decision in such (rare) cases.
This is an interesting discussion.
One thing to keep in mind though is that there are thousands of conferences and journals facing the same problem. The physics community in particular has been at the forefront of changing publishing and reviewing models. It is quite wishful thinking of a small number of well-established people in the ML community to believe that radical untried proposals (such as “everything published immediately”, or non-blind reviewing) can change things for the better when indeed the ML community is so large that its dynamics are hardly observable, let alone predictable.
“Blog-based peer review” was used for a recent book:
http://grandtextauto.org/2009/05/12/blog-based-peer-review-four-surprises/
This topic will come up again at NIPS this year. I want to critique one aspect of Yann’s argument. The proposal cites the case of SIFT as an example of the current publication model holding back research. But this is argument by anecdote which. Any publication model must seek to balance false positives (erroneous research accepted and published) against false negatives (good research that is not published) while allocating a fixed resource (the time of competent reviewers). There will always be false negatives, so pointing out a handful of them is not compelling evidence. One could even more easily point out false positive papers that should not have been accepted, particularly at conferences!
Under the arXiv-first model, the same false negatives could occur. They would take the form of papers that did not receive any attention by any of the Reviewing Entities. I’m not sure that there is any solution to this. Human attention is a limited, and biased, resource.
When I was a PhD student in Machine Learning a couple of years ago, i also got frustrated by the current system of publications.
I thought about a system close to the one I describe below: based on conference/journal ratings connected to paper rating, and on having a professional paid reviewing committee.
It would be
absolutely great if some of the ideas below could provide new points of view on
the problem, or even just make think of it again.
—————
For each conference paper, from one to three reviewers do not
understand what the paper is really about. And often not because it’s
not well written (though this may be the case), but because the
reviewer is not an expert in the scope of the paper.
=>
=> So the reviewers are left with looking at the paper from what they
have known before (because learning new is hard), and thus reject
potentially new ideas. Of course, learning for every single paper you
review would be very hard and long. Thus reviewing=>accepting becomes
a lottery.
=>This leads to another substantial problem for new researchers: the
reviews they obtain do not have any clue on what the authors did
right/wrong and how to improve the quality of their research.
=>Another important problem to address is the problem of number of
publications per CV. Currently the “scientific” performance is often
measured by the number of publications a person has. Of course, there
is a somewhat not very well-defined ranking of conferences, but it is
clear that bad papers are often published in good conferences, and
good papers in bad conferences (which few read). Good papers may be
published in bad conferences because of different reasons not
connected to quality of the ideas in it.
->
->I suggest to create a club of scientists which review papers. And
this club is paid (preferably money, but may be some other form of
good stimulus) for the reviews. For bad reviews, there would be a
system of fines. This will stimulate to give only good reviews and
only from the people who are experts in the field.
So I suggest a hierarchical system of conferences and journals.
-> Each conference is assigned (say, once a year) a rating from 0 to
1, calculated from the previous rating and quality of the conference
during the most recent year (since there are already some systems of
ratings, the adjustment to the rating system will not be too hard).
-> Each paper is submitted to the club (committee) for a price (the
price will fund the existence of the committee and reviewers). There,
the papers are classified by keywords, and then better classified by
the members of the committee to the reviewers . The reviewers get
enough time (say, 3-6 months) for the accurate review and provide its
rating from 0 to 1. Long time of reviewing will ensure that
1) the paper sent is of a good quality when submitted (nobody wants to
wait for long and pay to get a bad rating afterwards)
2) the reviews will be of a good quality
This work is paid for.
-> After receiving the result, the authors have three possibilities:
1) send the paper again, with additional price (since the reviews are
of a good quality and long time, it will make sense only if the paper
has been substantially improved).
2) send the paper to any conference/journal which has the rating less
than or equal to the rating of the paper. The conference has to accept
the paper.
3) send answer to the review. This step will also have a price but
also have a power to reduce the reviewer’s rating if the review was
indeed bad. The rating of the reviewer will influence the quality of
the papers they are given for review, and how much money they get to
provide those reviews. The process to assess the author’s answer and
the review assumes that the committee is competent and fair in their
decisions,
O>
This system will allow the scientists save time on rewriting the same
papers multiple times, obtain quality reviews, understand quality
metrics of scientific works, and have every result published in a
collection of the appropriate level (which, in its turn, will lead to
all the results being published and thus accessible to the scientific
community, and also some appropriate level of them will be given for
the purpose of evaluating CVs, prioritizing reading order, etc.).
The pay-per-review approach has some good merits, because you will boost the quality of submissions and decrease the overall reviewing load.
But, it’s not egalitarian enough for most of academia to embrace it. The scope of academics submitting ranges from students with small to no stipends to extremely successful people with money to burn. How would you set the price?
The other thing which is tricky here is determining the quality of the review. In practice, I expect that bad reviewers would essentially never be fined, because the judgment is never made.
Even if it might sound irrelevant for this blog, I would like to add the following:
Any submitted paper (to conference/journal/arxiv) should have its associated code, along with some readme file on how to run the code, what parameter setting was used to generate the results from such code and how exactly the tables or figures reported in the paper can be reproduced. This does not only prevent someone from fabricating results when under pressure, but also provides a nice interface to whoever decides not to go through the details of the paper but use the proposed framework as a black box. We do spend lot of time in generating results and writing code and this will possibly take at most one extra day. The code does not have to be perfect, well-commented or scalable, but some lame person (or the reviewer) should at least be able to reproduce the results. There is absolutely no point in misguiding the research community with fradulent results and buggy codes. A conference committee can be bit lenient during the initial submission process, but while submitting the camera ready version, one should submit the codes in a proper clean format — otherwise the conference committe might not select the paper for final publication.