October 2008 – Machine Learning (Theory)

I’m not as naturally exuberant as Muthu 2 or David about CS/Econ day, but I believe it and ML day were certainly successful.

At the CS/Econ day, I particularly enjoyed Toumas Sandholm’s talk which showed a commanding depth of understanding and application in automated auctions.

For the machine learning day, I enjoyed several talks and posters (I better, I helped pick them.). What stood out to me was number of people attending: 158 registered, a level qualifying as “scramble to find seats”. My rule of thumb for workshops/conferences is that the number of attendees is often something like the number of submissions. That isn’t the case here, where there were just 4 invited speakers and 30-or-so posters. Presumably, the difference is due to a critical mass of Machine Learning interested people in the area and the ease of their attendance.

Are there other areas where a local Machine Learning day would fly? It’s easy to imagine something working out in the San Francisco bay area and possibly Germany or England.

The basic formula for the ML day is a committee picks a few people to give talks, and posters are invited, with some of them providing short presentations. The CS/Econ day was similar, except they managed to let every submitter do a presentation. Are there tweaks to the format which would improve things?

Although I’m greatly interested in machine learning, I think it must be admitted that there is a large amount of low quality logic being used in reviews. The problem is bad enough that sometimes I wonder if the Byzantine generals limit has been exceeded. For example, I’ve seen recent reviews where the given reasons for rejecting are:

[NIPS] Theorem A is uninteresting because Theorem B is uninteresting.
[UAI] When you learn by memorization, the problem addressed is trivial.
[NIPS] The proof is in the appendix.
[NIPS] This has been done before. (… but not giving any relevant citations)

Just for the record I want to point out what’s wrong with these reviews. A future world in which such reasons never come up again would be great, but I’m sure these errors will be committed many times more in the future.

This is nonsense. A theorem should be evaluated based on it’s merits, rather than the merits of another theorem.
Learning by memorization requires an exponentially larger sample complexity than many other common approaches that often work well. Consequently, what is possible under memorization does not have any substantial bearing on common practice or what might be useful in the future.
Huh? Other people, thank you for putting the proof in the appendix, so the paper reads better. It seems absurd to base a decision on the placement of the content rather than the content.
This is a red flag for a bogus review. Every time I’ve seen a review (as an author or a fellow reviewer) where such claims are made without a concrete citation, they are false. Often they are false even when concrete citations are given.

A softer version of (4) is when someone is cranky because their own paper wasn’t cited. This is understandable, but a more appropriate response seems to be pointing things out, and reviewing anyways. This avoids creating the extra work (for authors and reviewers) of yet another paper resubmission, and reasonable authors do take such suggestions into account.

NIPS figures fairly prominently here. While these are all instances in the last year, my experience after interacting with NIPS for almost a decade is that the average quality of reviews is particularly low there—in many instances reviewers clearly don’t read the papers before writing the review. Furthermore, such low quality reviews are often the deciding factor for the paper decision. Blaming the reviewer seems to be the easy solution for a bad review, but a bit more thought suggests other possibilities:

Area Chair In some conferences an “area chair” or “senior PC” makes or effectively makes the decision on a paper. In general, I’m not a fan of activist area chairs, but when a reviewer isn’t thinking well, I think it is appropriate to step in. This rarely happens, because the easy choice is to simply accept the negative review. In my experience, many Area Chairs are eager to avoid any substantial controversy, and there is a general tendency to believe that something must be wrong with a paper that has a negative review, even if it isn’t what was actually pointed out.
Program Chair In smaller conferences, Program Chairs play the same role as the area chair, so all of the above applies, except now you know the persons name explicitly making them easier to blame. This is a little bit too tempting, I think. For example, I know David McAllester understands that learning by memorization is a bogus reference point, and probably he was just too busy to really digest the reviews. However, a Program Chair is responsible for finding appropriate reviewers for papers, and doing so (or not) has a huge impact on whether a paper is accepted. Not surprisingly, if a paper about the sample complexity of learning is routed to people who have never seen a proof involving sample complexity before, the reviews tend to be spuriously negative (and the paper unread).
Author A reviewer might blame an author, if it turns out later that the reasons given in the review for rejection were bogus. This isn’t absurd—writing a paper well is hard and it’s easy for small mistakes to be drastically misleading in technical content.
Culture A conference has a culture associated with it that is driven by the people who keep coming back. If in this culture it is considered ok to do all the reviews on the last day, it’s unsurprising to see reviews lacking critical thought that could be written without reading the paper. Similarly, it’s unsurprising to see little critical thought at the area chair level, or in the routing of papers to reviewers. This answer is pretty convincing: it explains why low quality reviews keep happening year after year at a conference.

If you believe the Culture reason, then what’s needed is a change in the culture. The good news is that this is both possible and effective. There are other conferences where reviewers expect to spend several hours reviewing a paper. In my experience this year, it was true of COLT and for my corner of SODA. Effecting the change is simply a matter of community standards, and that is just a matter of leaders in the community leading.

Month: October 2008

New York’s ML Day

NIPS 2008 workshop on Kernel Learning

Who is Responsible for a Bad Review?

NIPS 2008 workshop on ‘Learning over Empirical Hypothesis Spaces’