Why ICML? and the summer conferences – Machine Learning (Theory)

Here’s a quick reference for summer ML-related conferences sorted by due date:

Conference	Due date	Location	Reviewing
KDD	Feb 10	August 12-16, Beijing, China	Single Blind
COLT	Feb 14	June 25-June 27, Edinburgh, Scotland	Single Blind? (historically)
ICML	Feb 24	June 26-July 1, Edinburgh, Scotland	Double Blind, author response, zero SPOF
UAI	March 30	August 15-17, Catalina Islands, California	Double Blind, author response

Geographically, this is greatly dispersed and the UAI/KDD conflict is unfortunate.

Machine Learning conferences are triannual now, between NIPS, AIStat, and ICML. This has not always been the case: the academic default is annual summer conferences, then NIPS started with a December conference, and now AIStat has grown into an April conference.

However, the first claim is not quite correct. NIPS and AIStat have few competing venues while ICML implicitly competes with many other conferences accepting machine learning related papers. Since Joelle and I are taking a turn as program chairs this year, I want to make explicit the case for ICML.

COLT was historically a conference for learning-interested Computer Science theory people. Every COLT paper has a theorem, and few have experimental results. A significant subset of COLT papers could easily be published at ICML instead. ICML now has a significant theory community, including many pure theory papers and significant overlap with COLT attendees. Good candidates for an ICML submission are learning theory papers motivated by real machine learning problems (example: the agnostic active learning paper) or which propose and analyze new plausibly useful algorithms (example: the adaptive gradient papers). If you find yourself tempted to add empirical experiments to prove the point that your theory really works, ICML sounds like an excellent fit. Not everything is a good fit though—papers motivated by definitional aesthetics or tradition (Valiant style PAC learning comes to mind) may not be appreciated.
There are two significant advantages to ICML over COLT. One is that ICML provides a potentially much larger audience which appreciates and uses your work. That’s substantially less relevant this year, because ICML and COLT are colocating and we are carefully designing joint sessions for the overlap day.

The other is that ICML is committed to fair reviewing—papers are double blind so reviewers are not forced to take into account the author identity. Plenty of people will argue that author names don’t matter to them, but I’ve personally seen several cases as a reviewer where author identity affected the decision, typically towards favoring insiders or bigwigs at theory conferences as common sense would suggest. The double blind aspect of ICML reviewing is an open invitation to outsiders to submit to ICML.
Many UAI papers could easily go to ICML because they are explicitly about machine learning or connections with machine learning. For example, pure prediction markets are a stretch for ICML, but connections between machine learning and prediction markets, which seem to come up in multiple ways, are a good fit. Bernhard‘s lab has done quite a bit of work on extracting causality from prediction complexity which could easily interest people at ICML. I’ve personally found some work on representations for learning algorithms, such as sum-product networks of first class interest. UAI has a definite subcommunity of hardcore Bayesians which is less evident at ICML. ICML as a community seems more pragmatist w.r.t. Bayesian methods: if they work well, that’s good. Of the comparators here, UAI seems the most similar in orientation to ICML to me.
ICML provides a significantly larger potential audience and, due to it’s size, tends to be more diverse.
KDD is a large conference (a bit larger than ICML by attendance) which, as I understand it, initially started from the viewpoint of database people trying to do interesting things with the data they had. The conference is generally one step more commercial/industrial than ICML. Significant parts of the academic track are about machine learning technology and could have been submitted to ICML instead. I was impressed by the double robust sampling work and the out of core learning paper is cool. And, I often enjoy the differential privacy in learning work. KDD attendees tends to be very pragmatic about what works, which is reinforced by yearly prediction challenges. I appreciate this viewpoint quite a bit.
KDD doesn’t do double blind review, which was discussed above. To me, a more significant drawback of KDD is the ACM paywall. I was burned by this last summer. We decided to do a large scale learning survey based on the SUML compendium at KDD, but discovered too late that the video would be stuck behind the paywall, unlike our learning with exploration tutorial the year before. As I understand it, the year before ACM made them pay twice: once to videolectures and once to ACM, which was understandably judged unsustainable. The paywall is particularly rough for students who are not well-established, because it substantially limits their potential audience.

This is not a problem at ICML 2012. Every prepared presentation will be videotaped and we will have every paper easily and publicly accessible along with it. The effort you put into the presentation will payoff over hundreds or thousands of additional online views.
Area conferences. There are many other conferences which I think of as adjacent area conferences, including AAAI, ACL, SIGIR, CVPR and WWW which I have not attended enough or recently enough to make a real comparison with. Nevertheless, in each of these conferences, machine learning is a common technology. And sometimes new forms of machine learning technology are developed. Depending on many circumstances, ICML might be a good candidate for a place to send a paper on a new empirically useful piece of machine learning technology. Or not—the circumstances matter hugely.

Machine Learning has grown radically and gone industrial over the last decade, providing plenty of motivation for a conference on developing new core machine learning technology. Indeed, it is because of the power of ML that so much overlap exists. In most cases, the best place to send a paper is to the conference where it will be most appreciated. But, there is a real sense in which you create the community by participating in it. So, when the choice is unclear, sending the paper to a conference designed simultaneously for fair high quality reviewing and broad distribution of your work is a good call as it provides the most meaningful acceptance. For machine learning, that conference is ICML. Details of the ICML plan this year are here. We are on track.

As always, comments are welcome.

9 Replies to “Why ICML? and the summer conferences”

You forgot the best part — the ICML deadline is approximately two weeks after COLT and KDD 🙂 — so if your paper isn’t ready by the COLT or KDD deadlines, you can send it off to ICML.

“papers motivated by definitional aesthetics or tradition (Valiant style PAC learning comes to mind) may not be appreciated.”

Could you give more details as to what the above means? Is it something like “pure theory results do not have a place at ICML”?
Thanks

jl says:

1/5/2012 at 8:34 pm

In general, there are a number of pure theory papers at ICML. Examples I was involved in are Agnostic Active Learning and Delayed Q-learning. They are not the norm, but they are respected and not unusual.

The flavor tends to differs from COLT (or other theory conferences) though. This is partly a matter of self-selection, and partly a matter of what an ICML crowd most appreciates.

At ICML, people tend to care less about the method used to answer a theoretical question and more about the interestingness of the question from a “will you encounter it in the real word?” viewpoint. One example from my experience is the EXP4P algorithm which was rejected from COLT because the proof technique was similar to the EXP4 proof, but accepted at AIStats as one of a few notable papers because the question is of significant and obvious importance.

Not caring as much about the proof technique also means that you are freer to use new proof techniques, such as for the weighted all pairs paper. This is not a pure theory paper however: the experiments that we did implicitly validated the new proof technique and the algorithm developed to optimize it. In my experience, theory people can become set in their ways of thinking about things, which can make a new proof technique difficult to get across. Since the gestalt ICML attitude is more focused around what works, there is a natural acceptance of any new theory that works.

Like many conferences, ICML has some longstanding areas of investigation. One of these is reinforcement learning, so RL theory (and simplifications like contextual bandit theory) are a natural fit with a significant interested audience.

But the trend is towards more theory at ICML in general, and I think that’s accepted and viable. Based on past experience, perhaps half of COLT attendees will stay for ICML, plus there is significant interest from ICML-only attendees, which provides a substantial audience for theory sessions at ICML. We have a number of theory inclined and theory capable area chairs—Alina, Csaba, Daniel, David M., Elad, Frank, Geoff, Lihong, Mario, Sanjoy, Satyen, and Tong all have significant theory leanings with a diverse set of backgrounds {CS theory, statistics, others}. No good pure theory papers, as judged by these people, will be rejected.

Would ML for a specific NLP problem be of interest to ICML?

curious says:

1/10/2012 at 12:32 am

For example, Transfer Learning techniques for Multilingual Text Classification?
1. jl says:
  
  1/10/2012 at 7:40 am
  
  It depends.
  
  If it’s an application of an existing approach for transfer learning, it might be a hard sell. But if it’s a new approach to transfer learning, the NLP application might be quite appealing.

Hello! I apologize if you don’t want to turn the comments box into a Q&A session, but following the previous questions… Suppose you came up with an algorithm that lets SVMs work with hundreds of thousands of virtual samples without ever instantiating them, exploring a set of transformations that occurs often in Computer Vision and Audio Processing problems. Would this be a good fit for ICML, or would it be better for a pure CV conference? I see too arguments against it:
1) The whole issue of virtual samples and invariant kernels fell out of fashion years ago;
2) The proofs are self-contained, but don’t look like the ones in most ICML papers.

I’m not asking for a decision, I just need one more data point in order to feel a bit more confident! It’s very hard to get good feedback on these meta-subjects 🙂

jl says:

1/12/2012 at 5:04 pm

Invariant kernels fell out of fashion because they didn’t seem computationally tractable compared to simply instantiating the set of invariants as synthetic examples. If you address the computational tractability problem well, I could imagine that being of significant interest.
1. J says:
  
  1/15/2012 at 12:11 pm
  
  Thank you! I’ll take that into account.

Comments are closed.