Berkeley Streaming Data Workshop

The From Data to Knowledge workshop May 7-11 at Berkeley should be of interest to the many people encountering streaming data in different disciplines. It’s run by a group of astronomers who encounter streaming data all the time. I met Josh Bloom recently and he is broadly interested in a workshop covering all aspects of Machine Learning on streaming data. The hope here is that techniques developed in one area turn out useful in another which seems quite plausible. Particularly if you are in the bay area, consider checking it out.

ICML Posters and Scope

Normally, I don’t indulge in posters for ICML, but this year is naturally an exception for me. If you want one, there are a small number left here, if you sign up before February.

It also seems worthwhile to give some sense of the scope and reviewing criteria for ICML for authors considering submitting papers. At ICML, the (very large) program committee does the reviewing which informs final decisions by area chairs on most papers. Program chairs setup the process, deal with exceptions or disagreements, and provide advice for the reviewing process. Providing advice is tricky (and easily misleading) because a conference is a community, and in the end the aggregate interests of the community determine the conference. Nevertheless, as a program chair this year it seems worthwhile to state the overall philosophy I have and what I plan to encourage (and occasionally discourage).

At the highest level, I believe ICML exists to further research into machine learning, which I generally think of as turning observations into useful predictions. Research is greatly varied in general, but in all cases it involves answering an interesting question for which the answer was not previously known. Interesting questions are generally natural: they can be stated easily and other people plausibly encounter them. Interesting questions are generally also ones for which there are multiple plausible wrong answers. The definition of “interesting” is otherwise hard to pin down, because it is does and must change over time.

ICML is a broad conference which incorporates the interests of many different groups of people with different tastes in the research they prefer. It’s broad enough that most people don’t appreciate all the papers. That’s ok as long as there is some higher level appreciation for which directions of research benefit the community. Some common flavors are:

  1. ML for X In general, Machine Learning is a core field of study with many applications. Often, it’s a good idea to publish within a conference focused on that area, but particularly when no such conference exists, ICML is a solid choice for a place to publish. One example of this kind of thing is Machine Learning for Sustainability, where the CCC will be giving a few travel grants. Here the core question is typically “How?” Exhibiting new things that you can do with ML provides good reference points for what is possible, provides a sense of what works, and compelling new ideas about what to work on can be valuable to the community.

    There are several ways that papers of this sort can bounce. Perhaps X is insufficiently interesting, the results are unconvincing, or the method of solution is considered too straight-forward. I consider the first and second criteria sound, but am inclined toward leniency on the third, since there is often quite a bit of work in figuring out how to frame the problem so that the solution happens to be easy.

  2. New Algorithms Often, authors find that existing learning algorithms for solving some problem are lacking in some way, so they propose new better algorithms. This is plausibly the most common category of paper at ICML, so there is quite a bit of variety. The most straight-forward version proposes a new algorithm for a well-studied problem. For these papers it’s important to have an empirical comparison to existing baselines.

    It’s easy for an empirical comparison to go wrong. Some authors use synthetic datasets which do not seem significant to me, because good results on such datasets may not transfer to real-world problems well as the real world tends to be quite a bit more complex than the synthetic processes which are natural to program. Instead, it’s important to show good results on real datasets. One problem with relying on real datasets is dataset selection—choosing the dataset for which your algorithm seems to perform best. You can avoid this by choosing datasets in some clearly unbiased manner and by evaluating on many standard datasets. Another way to fail is with a poor choice of baseline. This is tricky, because three reviewers might consider three different baselines the most natural one. Asking around a bit when developing the paper might help here, but in the end this can be a tough judgement call: Is the paper convincing enough that people interested in solving the problem should use this algorithm?

    Another class of new algorithms papers is new algorithms for new areas of machine learning, blending into the previous category. Here, there typically are relatively few (perhaps just one) dataset available and there may be no (or only implausibly bad) baselines. For papers like this, one way I’ve seen difficulties is when authors are very invested in a particular approach to solving the problem. If you have defined the problem too narrowly, broadening the definition of the problem can help you see appropriate baselines. Another difficulty I’ve observed is reviewers used to the well-studied problems reject an interesting paper because (essentially) they assume that the authors left out a good baseline which does not exist. To prevent the first, authors who ask around might get some valuable early feedback. For the second, it’s a difficulty we are aware of and will consider asking reviewers to judge on the merits of ML for X.

  3. Algorithmic studies A relatively rare but potentially valuable form of paper is an algorithmic study. Here, the authors do not propose a new algorithm, but instead do a comprehensive empirical comparison of different algorithms. The standards here are quite high—the empirical comparison needs to be first-class to convince people, so the empirical comparison comments under new algorithms apply strongly.
  4. New Theory Good theory can enlighten us about what is (or might be) possible. It can also help us build robust learning algorithms, where we design learning algorithms so that they provably solve some large class of problems. I am personally most interested in theory that helps us design new learning algorithms, but broadly interested in what is possible. I’m most interested in the question answered, while the means (and language) should only be as complex as necessary so the theory can be understood as widely as possible.

    In many areas of CS theory, double blind reviewing is rare, so theory-oriented people may be unfamiliar with it. An important consequence is that complete proofs must be included either in the paper or supplemental material so that proof checking is fully feasible.

    Another way that I’ve seen theory papers run into trouble is when it is a post-hoc justification for an algorithm. In essence, authors who choose to analyze an existing algorithm are sometimes forced to make many unnatural assumptions for the theory to be correct. There generally isn’t an easy fix if you arrive at this point.

  5. n of the above It is common for ICML papers to be multicategory. At the extreme, you might have a new algorithm which solves a new X well, empirically and theoretically. Reviewers can fall into a trap where they are most interested in 1 of the 4 questions answered above, and find 1/4 of the paper devoted to their question relatively weak compared to the paper that devotes all the pages to the same question.

    We are aware of this, and will encourage it to be taken into account.

  6. The exception The set of papers I expect to see at ICML is more diverse than the above—there are often exceptions of one sort or another. For these exceptions, it often becomes a judgment call: Does this paper significantly further research into machine learning? Papers with little potential audience probably don’t while fun/interesting/useful things that we didn’t think of do.

Further comments or questions are welcome.

Why COLT?

By Shie and Nati

Following John’s advertisement for submitting to ICML, we thought it appropriate to highlight the advantages of COLT, and the reasons it is often the best place for theory papers. We would like to emphasize that we both respect ICML, and are active in ICML, both as authors and as area chairs, and certainly are not arguing that ICML is a bad place for your papers. For many papers, ICML is the best venue. But for many theory papers, COLT is a better and more appropriate place.

Why should you submit to COLT?

By-and-large, theory papers go to COLT. This is the tradition of the field and most theory papers are sent to COLT. This is the place to present your ground-breaking theorems and new models that will shape the theory of machine learning. COLT is more focused then ICML with a single track session. Unlike ICML, the norm in COLT is for people to sit through most sessions, and hear most of the talks presented. There is also often a lively discussion following paper presentations. If you want theory people to know of your work, you should submit to COLT.

Additionally, this year COLT and ICML are tightly co-located, with joint plenary sessions (i.e. some COLT papers will be presented in a plenary session to the entire combined COLT/ICML audience, as will some ICML papers), and many other opportunities for exposure to the wider ICML audience. And so, by submitting to COLT, you have the potential of reaching both the captive theory audience at COLT and the wider ML audience at ICML.

The advantages of sending to COLT:

  1. Rigorous review process.

    The COLT program committee is comprised entirely of established, mostly fairly senior, researchers. Program committee members read and review papers themselves, or potentially use a sub-reviewer that they know personally and carefully select for the paper, but still check and maintain responsibility for the review. Your paper will get reviewed by at least three program committee members, who will likely be experts on the topics covered by the paper. This is in contrast to ICML (and most other ML conferences) were area chairs (of similar seniority to the COLT program committee) only manage the review process, but reviewers are assigned based on load-balancing considerations and the primary reviewing is done by a very wide set of reviewers, frequently students, who are often not the most relevant experts.

    COLT reviews are typically detailed and technical details are checked. The reviewing process is less rushed and program committee members (and sub-reviewers were appropriate) are expected to do a careful job on each and every paper.

    All papers are then discussed by the program committee, and there is generally significant and meaningful discussions on papers. This also means the COLT reviewing process is far from having a “single point of failure”, as the paper will be carefully considered and argued for by multiple (senior) program committee members. We believe this yields a more consistently high quality program, with much less randomness in the paper selection process, which in turn translates to high respect for accepted COLT papers.

  2. COLT is not double blind, but also not exactly single blind. Program committee members have access to the author identities (as do area chairs in ICML), as this is essential in order to select sub-reviewers. However, the author names do not appear on the papers, both in order to reduce the effect of first impressions, and to allow program committee members to utilize reviewers who are truly blind to the author’s identities.

    It should be noted that the COLT anonimization guidelines are a bit more relaxed, which we hope makes it easier to create an anonimized version for conference submission (authors are still allowed to, and even encouraged, to post their papers online, with their names on them of course).

  3. COLT does not have a dedicated rebuttal phase. Frankly, with the higher quality, less random, reviews, we feel it is not needed, and the hassle to authors and program committee members is not worth it. However, the tradition in COLT, which we plan to follow, is to contact authors as needed during the review and discussion process to ask for clarification on issues that came up during review. In particular, if a concern is raised on the soundness or other technical aspect of a paper, the authors will be contacted to give them a chance to set things straight. But no, there is no generic author response where authors can argue and plead for acceptance.

Why ICML? and the summer conferences

Here’s a quick reference for summer ML-related conferences sorted by due date:

Conference Due date Location Reviewing
KDD Feb 10 August 12-16, Beijing, China Single Blind
COLT Feb 14 June 25-June 27, Edinburgh, Scotland Single Blind? (historically)
ICML Feb 24 June 26-July 1, Edinburgh, Scotland Double Blind, author response, zero SPOF
UAI March 30 August 15-17, Catalina Islands, California Double Blind, author response

Geographically, this is greatly dispersed and the UAI/KDD conflict is unfortunate.

Machine Learning conferences are triannual now, between NIPS, AIStat, and ICML. This has not always been the case: the academic default is annual summer conferences, then NIPS started with a December conference, and now AIStat has grown into an April conference.

However, the first claim is not quite correct. NIPS and AIStat have few competing venues while ICML implicitly competes with many other conferences accepting machine learning related papers. Since Joelle and I are taking a turn as program chairs this year, I want to make explicit the case for ICML.

  1. COLT was historically a conference for learning-interested Computer Science theory people. Every COLT paper has a theorem, and few have experimental results. A significant subset of COLT papers could easily be published at ICML instead. ICML now has a significant theory community, including many pure theory papers and significant overlap with COLT attendees. Good candidates for an ICML submission are learning theory papers motivated by real machine learning problems (example: the agnostic active learning paper) or which propose and analyze new plausibly useful algorithms (example: the adaptive gradient papers). If you find yourself tempted to add empirical experiments to prove the point that your theory really works, ICML sounds like an excellent fit. Not everything is a good fit though—papers motivated by definitional aesthetics or tradition (Valiant style PAC learning comes to mind) may not be appreciated.

    There are two significant advantages to ICML over COLT. One is that ICML provides a potentially much larger audience which appreciates and uses your work. That’s substantially less relevant this year, because ICML and COLT are colocating and we are carefully designing joint sessions for the overlap day.

    The other is that ICML is committed to fair reviewing—papers are double blind so reviewers are not forced to take into account the author identity. Plenty of people will argue that author names don’t matter to them, but I’ve personally seen several cases as a reviewer where author identity affected the decision, typically towards favoring insiders or bigwigs at theory conferences as common sense would suggest. The double blind aspect of ICML reviewing is an open invitation to outsiders to submit to ICML.

  2. Many UAI papers could easily go to ICML because they are explicitly about machine learning or connections with machine learning. For example, pure prediction markets are a stretch for ICML, but connections between machine learning and prediction markets, which seem to come up in multiple ways, are a good fit. Bernhard‘s lab has done quite a bit of work on extracting causality from prediction complexity which could easily interest people at ICML. I’ve personally found some work on representations for learning algorithms, such as sum-product networks of first class interest. UAI has a definite subcommunity of hardcore Bayesians which is less evident at ICML. ICML as a community seems more pragmatist w.r.t. Bayesian methods: if they work well, that’s good. Of the comparators here, UAI seems the most similar in orientation to ICML to me.

    ICML provides a significantly larger potential audience and, due to it’s size, tends to be more diverse.

  3. KDD is a large conference (a bit larger than ICML by attendance) which, as I understand it, initially started from the viewpoint of database people trying to do interesting things with the data they had. The conference is generally one step more commercial/industrial than ICML. Significant parts of the academic track are about machine learning technology and could have been submitted to ICML instead. I was impressed by the double robust sampling work and the out of core learning paper is cool. And, I often enjoy the differential privacy in learning work. KDD attendees tends to be very pragmatic about what works, which is reinforced by yearly prediction challenges. I appreciate this viewpoint quite a bit.

    KDD doesn’t do double blind review, which was discussed above. To me, a more significant drawback of KDD is the ACM paywall. I was burned by this last summer. We decided to do a large scale learning survey based on the SUML compendium at KDD, but discovered too late that the video would be stuck behind the paywall, unlike our learning with exploration tutorial the year before. As I understand it, the year before ACM made them pay twice: once to videolectures and once to ACM, which was understandably judged unsustainable. The paywall is particularly rough for students who are not well-established, because it substantially limits their potential audience.

    This is not a problem at ICML 2012. Every prepared presentation will be videotaped and we will have every paper easily and publicly accessible along with it. The effort you put into the presentation will payoff over hundreds or thousands of additional online views.

  4. Area conferences. There are many other conferences which I think of as adjacent area conferences, including AAAI, ACL, SIGIR, CVPR and WWW which I have not attended enough or recently enough to make a real comparison with. Nevertheless, in each of these conferences, machine learning is a common technology. And sometimes new forms of machine learning technology are developed. Depending on many circumstances, ICML might be a good candidate for a place to send a paper on a new empirically useful piece of machine learning technology. Or not—the circumstances matter hugely.

Machine Learning has grown radically and gone industrial over the last decade, providing plenty of motivation for a conference on developing new core machine learning technology. Indeed, it is because of the power of ML that so much overlap exists. In most cases, the best place to send a paper is to the conference where it will be most appreciated. But, there is a real sense in which you create the community by participating in it. So, when the choice is unclear, sending the paper to a conference designed simultaneously for fair high quality reviewing and broad distribution of your work is a good call as it provides the most meaningful acceptance. For machine learning, that conference is ICML. Details of the ICML plan this year are here. We are on track.

As always, comments are welcome.

Vowpal Wabbit version 6.1 & the NIPS tutorial

I just made version 6.1 of Vowpal Wabbit. Relative to 6.0, there are few new features, but many refinements.

  1. The cluster parallel learning code better supports multiple simultaneous runs, and other forms of parallelism have been mostly removed. This incidentally significantly simplifies the learning core.
  2. The online learning algorithms are more general, with support for l1 (via a truncated gradient variant) and l2 regularization, and a generalized form of variable metric learning.
  3. There is a solid persistent server mode which can train online, as well as serve answers to many simultaneous queries, either in text or binary.

This should be a very good release if you are just getting started, as we’ve made it compile more automatically out of the box, have several new examples and updated documentation.

As per tradition, we’re planning to do a tutorial at NIPS during the break at the parallel learning workshop at 2pm Spanish time Friday. I’ll cover the basics, leaving the fun stuff for others.

  1. Miro will cover the L-BFGS implementation, which he created from scratch. We have found this works quite well amongst batch learning algorithms.
  2. Alekh will cover how to do cluster parallel learning. If you have access to a large cluster, VW is orders of magnitude faster than any other public learning system accomplishing linear prediction. And if you are as impatient as I am, it is a real pleasure when the computers can keep up with you.

This will be recorded, so it will hopefully be available for viewing online before too long.

I hope to see you soon 🙂