Bob Williamson and I are the learning theory PC members at NIPS this year. This is some attempt to state the standards and tests I applied to the papers. I think it is a good idea to talk about this for two reasons:

- Making community standards a matter of public record seems healthy. It give us a chance to debate what is and is not the right standard. It might even give us a bit more consistency across the years.
- It may save us all time. There are a number of papers submitted which just aren’t there yet. Avoiding submitting is the right decision in this case.

There are several criteria for judging a paper. All of these were active this year. Some criteria are uncontroversial while others may be so.

- The paper must have a theorem establishing something new for which it is possible to derive high confidence in the correctness of the results. A surprising number of papers fail this test. This criteria seems essential to the definition of “theory”.
**Missing theorem statement****Missing proof**This isn’t an automatic fail, because sometimes reviewers can be expected to fill in the proof from discussion. (Not all theorems are hard.) Similarly, sometimes a proof sketch is adequate. Providing the right amount of detail to give confidence in the results is tricky, but general advice is: err on the side of being explicit.**Imprecise theorem statement**A number of theorems are simply too imprecise to verify or imagine verifying. Typically they are written in english or mixed math/english and have words like “small”, “very small”, or “itsy bitsy”.**Typos and thinkos**Often a theorem statement or proof is “right” when expressed correctly, but it isn’t expressed correctly: typos and thinkos (little correctable bugs in how you were thinking) confuse the reader.**Not new**This may be controversial, because the test of ‘new’ is stronger than some people might expect. A theorem of the form “algorithm A can do B” is not new when we already know “algorithm C can do B”.

Some of these problems are sometimes fixed by smart reviewers. Where that happens, it’s fine. Sometimes a paper has a reasonable chance of passing evaluation as an algorithms paper (which has experimental requirements). Where that happens, it’s fine.

- The paper should plausibly lead to algorithmic implications. This test is applied in a varying strength. For an older mathematical model of learning, we tried to apply at the level of “I see how an algorithm might be developed from this insight”. For a new model of learning, this test was applied only weakly.
- We did
*not*require that the paper be about machine learning. For non-learning papers, we decided to defer to the judgement of referees on whether or not the results were relevant to NIPS. It seemed more natural that authors/reviewers be setting the agenda here. - I had a preference for papers presenting new mathematical models. I liked Neil Lawrence‘s comment: “If we started rejecting learning theory papers for having the wrong model, where would we stop?” There is a natural tendency to forget the drawbacks of the accepted models in machine learning when evaluating new models, so it seems appropriate to provide some encouragement towards exploration.
- Papers were not penalized for having experiments. Sometimes experiments helped (especially when the theory was weak), and sometimes they had no effect.

Reviewing is a difficult process—it’s very difficult to get 82 (the number this time) important decisions right. It’s my hope that more such decisions can be made right in the future, so I’d like to invite comments on what the right criteria are and why. This year’s decisions are made now (and will be released soon), so any suggestions will just influence the future.

This is not really a comment on the criteria (they appear fine to me), but on the review process (from someone who just had his LT paper rejected, apparently on the basis of criterion 1.4)

I hope it is not totally off topic (and that it brings something besides just me whining)

1) It is incredibly frustrating to have no reaction whatsoever of the author response in the final reviews, especially if it concerns very precise technical points raised by the reviewer that are precisely rebutted/ answered to. Possibly, some discussion

occurs behind the scenes, but the author has no way of knowing this. If the author response is not deemed convincing (and assuming it concerns quite specific technical points), it would be very informative to know why it was not. I think the reviewers should be strongly encouraged to, at the very least, acknowledge that they have read the response, and possibly justify quickly why they did not find it convincing. Otherwise the authors are left with the impression of having answered into the void (and are left clueless as to what they should change next time in the paper).

2) Additional reviews that are apparently written by the PC themselves. If the intention is to sum up the basis for the final decision, I suggest to sign it as “the PC” just to make this point plainly clear. Did I mention that it would be great to have the extra review/final decision also acknowledge the author response in some way?

3) There should be an official policy regarding the posting of additionnal material on the web, that would be pointed to in the author response (at least I looked for such a guideline but did not find it) and whether this is considered acceptable or not. There are surely arguments in favor of it (having more space to refute a particularly technical objection) and against it (the response is not the place to fix the paper). This appears as a common practice (last year while a NIPS reviewer I had two instances of responses consisting only in a web pointer to a larger document) and therefore it would be good to have an official guideline about it.

It is correct to assume that there was discussion—Bob and I pushed for significant discussion on any of the papers which were borderline or higher.

I sympathize with (1). The problem is that reviewers are reluctant to invest the time. When you were a reviewer last year, did you update your reviews based on the author response? If so, great. If not, that’s common.

Point (2) is good, and it is correct to assume (for LT papers) that any extra reviews are an attempt on our part to summarize the thinking which occurred. You are right—we should have signed them to make this clear.

I believe the official policy on (3) is “no”.

To a mathematician or theoretical computer scientist, I can understand that “theory” is pretty much synonymous with “theorem-proving”; to a physicist, though, “theory” is more often understood as “calculation”, as opposed to computer simulation or actual experiment. It seems safe to say that most papers in theoretical physics do not contain formal theorem statements and/or proofs. Many researchers with backgrounds in theoretical physics are attracted to NIPS. I think NIPS should remain open to papers written in this style; some great work in LT (broadly defined) has emerged from this tradition.

I did not really understand this, and it seems reasonable to take this into account for later NIPS. What is a canonical example of a good physics learning theory paper? How do you evaluate such papers? Is it purely a matter of taste? I could see criteria 2, 3, 4, and 5 being active, but it seems like criteria 1 should be replaced with something else.

The system that SIGGRAPH has adopted is that the PC members write a “review summary” for many of the papers after the committee meeting. The name “review summary” is misleading: it’s really meant to be an explanation to the authors as to why the paper was or wasn’t accepted. Very often, the reviews give an incorrect impression as to how the decision was made for a paper, e.g., one review is bitterly negative and also wrong. The authors then incorrectly think that this bad review “killed” their paper, even though the PC might have discounted this review. Having the PC members write a few sentences saying “we discussed this paper at the meeting and felt it could not be accepted for the following reasons” works wonders to keep the authors informed as to the process, inform them that their paper didn’t get summarily rejected, and reduce the perception of “noise” in the process. As a result, people complain less that the process is broken (although people still do complain).

I think this is something that every serious conference should do. It just takes a few minutes at the meeting and can be extremely informative to the authors.

Putting this information in a separate review is a mistake: it is not a review, and it should be made clear to the authors that the message is coming from the committee, not an additional reviewer. NIPS insiders might know that it came from the meeting, but people who are not as familiar with the system won’t.

Of course, it’s not necessary to write such a message when the reviews make it clear how the decision was made. Sometimes the authors’ response doesn’t really deserve a response, e.g., if it’s just restating what the paper says. (I don’t think that authors should reasonably expect a response to their rebuttal; there just isn’t time for point-by-point discussion).

That response slightly worries me. To me the strength of NIPS has long been its embrace of theoretical in terms of calculational vs. theorem proving and I hate to think this has been replaced with a solely CS vision of theory.

I think the book that grew out of the NIPS workshop on mean field methods of calculation captures this idea:

http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=3847

Papers Like

http://books.nips.cc/papers/files/nips14/LT25.pdf#search=%22learning%20curves%20spin%20glass%22

capture the flavor of the results. (“Exact, but not rigorous” being a key phrase.) It’s certainly true that much recent work (e.g. http://www.amazon.com/Spin-Glasses-Mathematicians-Grenzgebiete-Mathematics/dp/3540003568/sr=8-7/qid=1157636721/ref=sr_1_7/103-0399161-2079855?ie=UTF8&s=books) has gone into making some of these ideas fully rigorous, but learning theory is hopefully still more than theorem proving.

How do you evaluate such papers? Is it purely a matter of taste?This seems somewhat pejorative, and I don’t see why. What makes a theorem significant? Just taste? Yes, probably so: evaluating the novelty of a result and its potential impact on the community does involve a great deal of taste.

I certainly don’t intend this to be perjorative.

If we-the-community want to include such papers (which seems reasonable), we-the-community need to have some notion of what makes a good paper along these lines.

Rephrased, requirement (1) is not some arbitrary choice. It is my best attempt to encode the standards which a learning theory paper must obey to be recognized and built on by others. What are the standards which would apply to these papers?

a test