Bob Williamson and I are the learning theory PC members at NIPS this year. This is some attempt to state the standards and tests I applied to the papers. I think it is a good idea to talk about this for two reasons:
- Making community standards a matter of public record seems healthy. It give us a chance to debate what is and is not the right standard. It might even give us a bit more consistency across the years.
- It may save us all time. There are a number of papers submitted which just aren’t there yet. Avoiding submitting is the right decision in this case.
There are several criteria for judging a paper. All of these were active this year. Some criteria are uncontroversial while others may be so.
- The paper must have a theorem establishing something new for which it is possible to derive high confidence in the correctness of the results. A surprising number of papers fail this test. This criteria seems essential to the definition of “theory”.
- Missing theorem statement
- Missing proof This isn’t an automatic fail, because sometimes reviewers can be expected to fill in the proof from discussion. (Not all theorems are hard.) Similarly, sometimes a proof sketch is adequate. Providing the right amount of detail to give confidence in the results is tricky, but general advice is: err on the side of being explicit.
- Imprecise theorem statement A number of theorems are simply too imprecise to verify or imagine verifying. Typically they are written in english or mixed math/english and have words like “small”, “very small”, or “itsy bitsy”.
- Typos and thinkos Often a theorem statement or proof is “right” when expressed correctly, but it isn’t expressed correctly: typos and thinkos (little correctable bugs in how you were thinking) confuse the reader.
- Not new This may be controversial, because the test of ‘new’ is stronger than some people might expect. A theorem of the form “algorithm A can do B” is not new when we already know “algorithm C can do B”.
Some of these problems are sometimes fixed by smart reviewers. Where that happens, it’s fine. Sometimes a paper has a reasonable chance of passing evaluation as an algorithms paper (which has experimental requirements). Where that happens, it’s fine.
- The paper should plausibly lead to algorithmic implications. This test is applied in a varying strength. For an older mathematical model of learning, we tried to apply at the level of “I see how an algorithm might be developed from this insight”. For a new model of learning, this test was applied only weakly.
- We did not require that the paper be about machine learning. For non-learning papers, we decided to defer to the judgement of referees on whether or not the results were relevant to NIPS. It seemed more natural that authors/reviewers be setting the agenda here.
- I had a preference for papers presenting new mathematical models. I liked Neil Lawrence‘s comment: “If we started rejecting learning theory papers for having the wrong model, where would we stop?” There is a natural tendency to forget the drawbacks of the accepted models in machine learning when evaluating new models, so it seems appropriate to provide some encouragement towards exploration.
- Papers were not penalized for having experiments. Sometimes experiments helped (especially when the theory was weak), and sometimes they had no effect.
Reviewing is a difficult process—it’s very difficult to get 82 (the number this time) important decisions right. It’s my hope that more such decisions can be made right in the future, so I’d like to invite comments on what the right criteria are and why. This year’s decisions are made now (and will be released soon), so any suggestions will just influence the future.