Machine Learning (Theory)

1/7/2015

The NIPS experiment

Tags: Conferences,Machine Learning jl@ 2:38 pm

Corinna Cortes and Neil Lawrence ran the NIPS experiment where 1/10th of papers submitted to NIPS went through the NIPS review process twice, and then the accept/reject decision was compared. This was a great experiment, so kudos to NIPS for being willing to do it and to Corinna & Neil for doing it.

The 26% disagreement rate presented at the conference understates the meaning in my opinion, given the 22% acceptance rate. The immediate implication is that between 1/2 and 2/3 of papers accepted at NIPS would have been rejected if reviewed a second time. For analysis details and discussion about that, see here.

Let’s give P(reject in 2nd review | accept 1st review) a name: arbitrariness. For NIPS 2014, arbitrariness was ~60%. Given such a stark number, the primary question is “what does it mean?”

Does it mean there is no signal in the accept/reject decision? Clearly not—a purely random decision would have arbitrariness of ~78%. It is however quite notable that 60% is much closer to 78% than 0%.

Does it mean that the NIPS accept/reject decision is unfair? Not necessarily. If a pure random number generator made the accept/reject decision, it would be ‘fair’ in the same sense that a lottery is fair, and have an arbitrariness of ~78%.

Does it mean that the NIPS accept/reject decision could be unfair? The numbers give no judgement here. It is however a natural fallacy to imagine that random judgements derived from people implies unfairness, so I would encourage people to withhold judgement on this question for now.

Is an arbitrariness of 0% the goal? Achieving 0% arbitrariness is easy: just choose all papers with an md5sum that ends in 00 (in binary). Clearly, there is something more to be desired from a reviewing process.

Perhaps this means we should decrease the acceptance rate? Maybe, but this makes sense only if you believe that arbitrariness is good, as it will almost surely increase the arbitrariness. In the extreme case where only one paper is accepted, the odds of it being the rejected on re-review are near 100%.

Perhaps this means we should increase the acceptance rate? If all papers submmitted were accepted, the arbitrariness would be 0, but as mentioned above arbitrariness 0 is not the goal.

Perhaps this means that NIPS is a very broad conference with substantial disagreement by reviewers (and attendees) about what is important? Maybe. This even seems plausible to me, given anecdotal personal experience. Perhaps small highly-focused conferences have a smaller arbitrariness?

Perhaps this means that researchers submit themselves to an arbitrary process for historical reasons? The arbitrariness is clear, but the reason less so. A mostly-arbitrary review process may be helpful in the sense that it gives authors a painful-but-useful opportunity to debug the easy ways to misinterpret their work. It may also be helpful in that it perfectly rejects the bottom 20% of papers which are actively wrong, and hence harmful to the process of developing knowledge. None of these reasons are confirmed of course.

Is it possible to do better? I believe the answer is “yes”, but it should be understood as a fundamentally difficult problem. Every program chair who cares tries to tweak the reviewing process to be better, and there have been many smart program chairs that tried hard. Why isn’t it better? There are strong nonvisible constraints on the reviewers time and attention.

What does it mean? In the end, I think it means two things of real importance.

  1. The result of the process is mostly arbitrary. As an author, I found rejects of good papers very hard to swallow, especially when the reviews were nonsensical. Learning to accept that the process has a strong element of arbitrariness helped me deal with that. Now there is proof, so new authors need not be so discouraged.
  2. CMT now has a tool for measuring arbitrariness that can be widely used by other conferences. Joelle and I changed ICML 2012 in various ways. Many of these appeared beneficial and some stuck, but others did not. In the long run, it’s the things which stick that matter. Being able to measure the review process in a more powerful way might be beneficial in getting good review practices to stick.

Other commentary from Lance, Bert, and Yisong.

Edit: Cross-posted on CACM.

10/11/2014

Conference on Digitial Experimentation

I just attended CODE. The set of people interested in digital experimentation have very diverse backgrounds encompassing theory, machine learning, social science, economics, and industry so this seems like a good subject for a new conference. I hope it continues.

I found several talks interesting.

  • Eytan Bakshy talked about PlanOut which is language/platform for flexibly specifying experiments.
  • Ron Kohavi talked about EXP which is a heavily used A/B testing platform.
  • Susan Athey talked about long term vs short term metrics which seems both important to address, a constant problem, and not yet systematically solved.

There was a panel about the ongoing Facebook experimentation controversy. The issue here is complex. My understanding is that Facebook users have some expected ownership of the content they create, and hence aren’t comfortable with the content being used in unexpected ways. On the other hand, experimentation is so necessary to the functioning of all large modern internet sites that banning it or slowing down the process by a factor of a million (as some advocated) would badly degrade the future of these sites in practice.

My belief is that what’s lacking is education and trust. W.r.t. education, people need to understand that experimentation is unavoidable when trying to figure out how to optimize an enormously complex system, as there is just no other way to systematically make 1000 right decisions as is necessary for basic things like choosing the best homepage/search result/etc… W.r.t. trust, companies are not particularly good at creating trust in general, but finding the right mechanism for doing so seems critical. I would point out Vanguard as a company that managed to successfully create trust by design.

6/24/2014

Interesting papers at ICML 2014

This year’s ICML had several papers which I want to read through more carefully and understand better.

  1. Chun-Liang Li, Hsuan-Tien Lin, Condensed Filter Tree for Cost-Sensitive Multi-Label Classification. Several tricks accumulate to give a new approach for addressing cost sensitive multilabel classification.
  2. Nikos Karampatziakis and Paul Mineiro, Discriminative Features via Generalized Eigenvectors. An efficient, effective eigenvalue solution for supervised learning yields compelling nonlinear performance on several datasets.
  3. Nir Ailon, Zohar Karnin, Thorsten Joachims, Reducing Dueling Bandits to Cardinal Bandits. An effective method for reducing dueling bandits to normal bandits that extends to contextual situations.
  4. Pedro Pinheiro, Ronan Collobert, Recurrent Convolutional Neural Networks for Scene Labeling. Image parsing remains a challenge, and this is plausibly a step forward.
  5. Cicero Dos Santos, Bianca Zadrozny, Learning Character-level Representations for Part-of-Speech Tagging. Word morphology is clearly useful information, and yet almost all ML-for-NLP applications ignore it or hard-code it (by stemming).
  6. Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire, Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. Statistically efficient interactive learning is now computationally feasible. I wish this one had been done in time for the NIPS tutorial :-)
  7. David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller, Deterministic Policy Gradient Algorithms. A reduction in variance from working out the deterministic limit of policy gradient make policy gradient approaches look much more attractive.

Edit: added one that I forgot.

6/18/2014

An ICML proposal: yearly surveys

I’d like to propose that ICML conducts a yearly survey similar to the one from 2010 or 2012 which is reported to all.

The key reason for this is information: I expect everyone participating in ICML has some baseline interest in how ICML is doing. Everyone involved has personal anecdotal information, but we all understand that a few examples can be highly misleading.

Aside from satisfying everyone’s joint curiousity, I believe this could improve ICML itself. Consider for example reviewing. Every program chair comes in with ideas for how to make reviewing better. Some succeed, but nearly all are forgotten by the next round of program chairs. Making survey information available will help quantify success and correlate it with design decisions.

The key question to ask for this is “who?” The reason why surveys don’t happen more often is that it has been the responsibility of program chairs who are typically badly overloaded. I believe we should address this by shifting the responsibility to a multiyear position, similar to or the same as a webmaster. This may imply a small cost to the community (<$1/participant) for someone’s time to do and record the survey, but I believe it’s a worthwhile cost. I plan to bring this up with IMLS board in Beijing, but would like to invite any comments or thoughts.

7/24/2013

ICML 2012 videos lost

A big ouch—all the videos for ICML 2012 were lost in a shuffle. Rajnish sends the below, but if anyone can help that would be greatly appreciated.

——————————————————————————

Sincere apologies to ICML community for loosing 2012 archived videos

What happened: In order to publish 2013 videos, we decided to move 2012 videos to another server. We have a weekly backup service from the provider but after removing the videos from the current server, when we tried to retrieve the 2012 videos from backup service, the backup did not work because of provider-specific requirements that we had ignored while removing the data from previous server.

What are we doing about this: At this point, we are still looking into raw footage to find if we can retrieve some of the videos, but following are the steps we are taking to make sure this does not happen again in future:
(1) We are going to create a channel on Vimeo (and potentially on YouTube) and we will publish there the p-in-p- or slide-versions of the videos. This will be available by the beginning of Oct 2013.
(2) We are going to provide download links from TechTalks so that the slide-version (of p-in-p- version if availbale) of the videos can be directly downloaded by viewers.This feature will be available by Aug 4th 2013.
(3) Of course we are now creating regular backups that do not depend on our service provider.

How can you help: If you have downloaded from TechTalks the ICML 2012 videos using external tools, we will really appreciate if you can provide us the videos, please email at support@techtalks.tv .

Thank you,
Rajnish.

Older Posts »

Powered by WordPress