The main program will feature invited talks from Peter Bartlett, William Freeman, and Vladimir Vapnik, along with numerous spotlight talks and a poster session. Following the main program, hackNY and Microsoft Research are sponsoring a networking hour with talks from machine learning practitioners at NYC startups (specifically bit.ly, Buzzfeed, Chartbeat, and Sense Networks, Visual Revenue). This should be of great interest to everyone considering working in machine learning.
- Learning Reductions: I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct:
- Multiclass Classification –oaa or –ect.
- Cost Sensitive Multiclass Classification –csoaa or –wap.
- Contextual Bandit Classification –cb.
- Sequential Structured Prediction –searn or –dagger
In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning.
- Library interface: VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over time.
- Windows port: The priority of a windows port jumped way up once we moved to Microsoft. The only feature which we know doesn’t work at present is automatic backgrounding when in daemon mode.
- New update rule: Stephane visited us this summer, and we fixed the default online update rule so that it is unit invariant.
There are also many other small updates including some contributed utilities that aid the process of applying and using VW.
Plans for the near future involve improving the quality of various items above, and of course better documentation: several of the reductions are not yet well documented.
The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to firstname.lastname@example.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year.
The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing.
Edit: Fixed second ICML deadline.
There are a handful of basic code patterns that I wish I was more aware of when I started research in machine learning. Each on its own may seem pointless, but collectively they go a long way towards making the typical research workflow more efficient. Here they are:
- Separate code from data.
- Separate input data, working data and output data.
- Save everything to disk frequently.
- Separate options from parameters.
- Do not use global variables.
- Record the options used to generate each run of the algorithm.
- Make it easy to sweep options.
- Make it easy to execute only portions of the code.
- Use checkpointing.
- Write demos and tests.
My guess is that these patterns will not only be useful for machine learning, but also any other computational work that involves either a) processing large amounts of data, or b) algorithms that take a significant amount of time to execute. Share this list with your students and colleagues. Trust me, they’ll appreciate it.
The workshop on the Meaningful Use of Complex Medical Data is happening again, August 9-12 in LA, near UAI on Catalina Island August 15-17. I enjoyed my visit last year, and expect this year to be interesting also.
Yaser points out some nicely videotaped machine learning lectures at Caltech. Yaser taught me machine learning, and I always found the lectures clear and interesting, so I expect many people can benefit from watching. Relative to Andrew Ng‘s ML class there are somewhat different areas of emphasis but the topic is the same, so picking and choosing the union may be helpful.
For those who are remote (like me) or after the conference (like everyone), Mark Reid has setup the ICML discussion site where you can comment on any paper or subscribe to papers. Authors are automatically subscribed to their own papers, so it should be possible to have a discussion significantly after the fact, as people desire.
We also conducted a survey before the conference and have the survey results now. This can be compared with the ICML 2010 survey results. Looking at the comparable questions, we can sometimes order the answers to have scores ranging from 0 to 3 or 0 to 4 with 3 or 4 being best and 0 worst, then compute the average difference between 2012 and 2010.
Glancing through them, I see:
- Most people found the papers they reviewed a good fit for their expertise (-.037 w.r.t 2010). Achieving this was one of our subgoals in the pursuit of high quality decisions.
- Most people had sufficient time for doing reviews. This was something that we worried about significantly in shifting the paper deadline and otherwise massaging the schedule. Most people also thought the review period was sufficiently long and most reviews were high quality (+.023 w.r.t. 2010)
- About 1/4 of reviewers say that author response changed their mind on a paper and 2/3 of reviewers say discussion changed their mind on a paper. The expectation of decision impact from author response is reduced from 2010 (-.135). The existence of author response is overwhelmingly preferred.
- People generally found ICML reviewing the same or better than previous ICMLs (+.35 w.r.t. 2010) and other similar conferences (+.198 w.r.t. 2010) at the cost of being somewhat more work. A substantial bump in reviewing quality was a primary goal.
- The ACs spent substantially more time (43 hours on average) than PC members (28 hours on average). This agrees with our expectation—the set of ACs didn’t change even after we had a 50% increase in submissions. The AC load we had this year was probably too high and will need to be reduced somewhat for next year.
- 2/3 of authors prefer the option to revise a paper during author response.
- The choice of how to deal with increased submissions is deeply undecided, with a slight preference for short talk+poster as we did.
- Most people like having two workshop days or don’t care.
- There is a strong preference for COLT and UAI colocation with the next tier of preference for IJCAI, KDD, AAAI, and CVPR.
We had advanced warning from Prabhakar through the simple act of leaving. Yahoo! Research was a world class organization that Prabhakar recruited much of personally, so it is deeply implausible that he would spontaneously decide to leave. My first thought when I saw the news was “Uhoh, Rob said that he knew it was serious when the head of ATnT Research left.” In this case it was even more significant, because Prabhakar recruited me on the premise that Y!R was an experiment in how research should be done: via a combination of high quality people and high engagement with the company. Prabhakar’s departure is a clear end to that experiment.
The result is ambiguous from a business perspective. Y!R clearly was not capable of saving the company from its illnesses. I’m not privy to the internal accounting of impact and this is the kind of subject where there can easily be great disagreement. Even so, there were several strong direct impacts coming from the machine learning, economics, and algorithms groups.
Y!R clearly was excellent from an academic research perspective. On a per person basis in relevant subjects, it was outstanding. One way to measure this is by noticing that both ICML and KDD had (co)program chairs from Y!R. It turns out that talking to the rest of the organization doing consulting, architecting, and prototyping on a minority basis helps research by sharpening the questions you ask more than it hinders by taking up time. The decision to participate in this experiment was a good one for me personally.
It has been clear in silicon valley, academia, and pretty much everywhere else that people at Yahoo! including Yahoo! Research have been looking around for new positions. Maintaining the excellence of Y!R in a company that has been under prolonged stress was challenging leadership-wise. Consequently, the abrupt departure of Prabhakar and an apparent lack of appreciation by the new CEO created a crisis of confidence. Many people who were sitting on strong offers quickly left, and everyone else started looking around.
In this situation, my first concern was for colleagues, both in Machine Learning across the company and the Yahoo! Research New York office.
Machine Learning turns out to be a very hot technology. Every company and government in the world is drowning in data, and Machine Learning is the prime tool for actually using it to do interesting things. More generally, the demand for high quality seasoned machine learning researchers across startups, mature companies, government labs, and academia has been astonishing, and I expect the outcome to reflect that. This is remarkably different from the cuts that hit ATnT research in late 2001 and early 2002 where the famous machine learning group there took many months to disperse to new positions.
In the New York office, we investigated many possibilities hard enough that it became a news story. While that article is wrong in specifics (we ended up not fired for example, although it is difficult to discern cause and effect), we certainly shook the job tree very hard to see what would fall out. To my surprise, amongst all the companies we investigated, Microsoft had a uniquely sufficient agility, breadth of interest, and technical culture, enabling them to make offers that I and a significant fraction of the Y!R-NY lab could not resist. My belief is that the new Microsoft Research New York City lab will become an even greater techhouse than Y!R-NY. At a personal level, it is deeply flattering that they have chosen to create a lab for us on short notice. I will certainly do my part chasing the greatest learning algorithms not yet invented.
In light of this, I would encourage people in academia to consider Yahoo! in as fair a light as possible in the current circumstances. There are and will be some serious hard feelings about the outcome as various top researchers elsewhere in the organization feel compelled to look for jobs and leave. However, Yahoo! took a real gamble supporting a research organization about 7 years ago, and many positive things have come of this gamble from all perspectives. I expect almost all of the people leaving to eventually do quite well, and often even better.
What about ICML? My second thought on hearing about Prabhakar’s departure was “I really need to finish up initial paper/reviewer assignments today before dealing with this”. During the reviewing period where the program chair load is relatively light, Joelle handled nearly everything. My great distraction ended neatly in time to help with decisions at ICML. I considered all possibilities in accepting the job and was prepared to simply put aside a job search for some time if necessary, but the timing was surreally perfect. All signs so far point towards this ICML being an exceptional ICML, and I plan to do everything that I can to make that happen. The early registration deadline is May 13.
What about Vowpal Wabbit? Amongst other things, VW is the ultrascale learning algorithm, not the kind of thing that you would want to put aside lightly. I negotiated to continue the project and succeeded. This surprised me greatly—Microsoft has made serious commitments to supporting open source in various ways and that commitment is what sealed the deal for me. In return, I would like to see Microsoft always at or beyond the cutting edge in machine learning technology.
This is a rather long post, detailing the ICML 2012 review process. The goal is to make the process more transparent, help authors understand how we came to a decision, and discuss the strengths and weaknesses of this process for future conference organizers.
Microsoft’s Conference Management Toolkit (CMT)
We chose to use CMT over other conference management software mainly because of its rich toolkit. The interface is sub-optimal (to say the least!) but it has extensive capabilities (to handle bids, author response, resubmissions, etc.), good import/export mechanisms (to process the data elsewhere), excellent technical support (to answer late night emails, add new functionalities). Overall, it was the right choice, although we hope a designer will look at that interface sometime soon!
Toronto Matching System (TMS)
TMS is now being used by many major conferences in our field (including NIPS and UAI). It is an automated system (developed by Laurent Charlin and Rich Zemel at U. Toronto) to match reviewers to papers, based on an analysis of each reviewer’s publications. TMS collects publications from reviewers, parses them into features and applies unsupervised or supervised learning techniques to predict the relevance of any target paper for any reviewer. We convinced TMS to integrate with CMT and funded Laurent’s work for that. Reviewers were asked to put in a publication list for TMS to parse. For those who failed to do so (after many reminders!), we manually added that information from public sources.
The Program Committee
Recruiting a program committee that is both large and highly qualified is difficult these days. We sent out 69 area chair invitations; 50 (highly qualified!) people accepted. Each of these area chairs was asked to nominate a list of potential reviewers. We sent out approximately 700 invitations for program committee members; 389 accepted. A number of additional PC members were recruited during the review process (most of them for 1-2 papers), for a total of 470 active PC members. In terms of seniority, the final PC contains about ~15% students, 80% researchers, 5% other.
The Surge (ICML + 50%)
The first big challenge came on the submission deadline. In the past few years, ICML had consistently received ~550-600 submissions. This year, we had a 50% increase, to 890 submissions. We had recruited a PC that could comfortably handle 700 papers. Dealing with an extra 200 papers was not an easy task.
About 10 submissions were rejected without review for various reasons (severe formatting issues, extra pages, non-anonymization).
An unsupervised version of TMS was used to generate a list of candidate papers for each reviewer and area chair. This was done working closely with the Laurent Charlin of TMS using validation on previous NIPS data. CMT did not have the functionality to show a good list of candidate papers to reviewers, so we crafted an interface to show this list and let reviewers use that in conjunction with CMT. Ideally, this will be better incorporated in CMT in the future.
When you ask a group of scientists to run a conference, you must expect a few experiments will take place…. And so we decided to assess the usefulness of TMS scoring for generating lists of papers to bid on. To do this, we (randomly) assigned PC members to 1 of 3 groups. One group saw a list purely based on TMS scores. Another group received a list based on the matching between their subject area and that of the paper (referred to as the “relevance” score in CMT). The third group received a list based on a mix of both TMS and relevance. Reviewers were allowed to bid on any paper (excluding those with which they had a conflict); the lists were provided to help them efficiently sort through the large number of papers. We then compared the set of bids for a reviewer, with the list of suggestions, and measured the correspondence.
The following is the Discounted Cumulative Gain (DCG) of each list with respect to the bidding scores, averaged separately for each group. Note that each group was only presented with their corresponding list and not the others.
|Group: CMT||Group: TMS||Group: CMT+TMS|
|Sorting by CMT scores||6.11 out of 12.64 (48%)||4.98 out of 13.63 (36%)||4.87 out of 13.55 (35%)|
|Sorting by TMS score||4.06 out of 12.64 (32%)||6.43 out of 13.63 (47%)||5.72 out of 13.55 (42%)|
|Sorting by TMS+CMT||4.77 out of 12.64 (37%)||6.11 out of 13.63 (44%)||6.71 out of 13.55 (49%)|
A micro-survey was also run to collect further information on how users liked their short list. 85% of the participants indicated that they have used the list interface provided to them. The following is the preference indicated by each group (~75 reviewers in each group, ~2% error):
|Preferred CMT over list||15%||12%||8%|
|Preferred list over CMT||4%||5%||9%|
It is obvious from the above that most participants found the list useful in conjunction with CMT (suggesting that the list should be integrated inside CMT). We can also see that those who were presented with a list based on TMS scores were more likely to find the list useful.
Note that all of the above was done in a long hectic but fun weekend.
Imputing Missing Bids
CMT assumes that the reviewers are not willing to review a paper unless stated otherwise. It does not differentiate between an unseen (but potentially relevant) paper and a paper that has been seen and ignored. This is a real shortcoming when it comes to matching papers to reviewers, especially for those reviewers that did not bid often. To mitigate this problem, we used the click information on the shortlist presented to the reviewers to find out which papers have been observed and ignored. We then impute these cases as real non-willing bids.
Around 30 reviewers did not provide any bids (and many had only a few). This is problematic because the tools used to do the actual reviewer-paper matching tend to assign the papers without any bids to the reviewers who did not bid, regardless of the match in expertise.
Once the bidding information was in and imputation was done, we now had to fill in the rest of the paper-reviewer bidding matrix to mitigate the problem with sparse bidders. This was done, once again, through TMS, but this time using a supervised learning approach.
Using supervised learning was more delicate than expected. To deal with the wildly varying number of bids per person, we imputed zero bids, first from papers that were plausibly skipped over, and if necessary at random from papers not bid on such that each person had the same expected bid in the dataset. From this dataset, we held out a random bid per person, and then trained to predict well the heldout bid. Most optimization approaches performed poorly due to the number of features greatly exceeding the number of labels. The best approach we found used the online algorithms in Vowpal Wabbit with a mass personalized training method similar to the one discussed here. This trained predictor was used to predict bid values for the full paper-reviewer bid matrix.
Automated Area Chair and First Reviewer Assignment
Once we had the imputed paper-reviewer bidding matrix, CMT was used to generate the actual match between papers and area chairs, and (separately) between papers and reviewers. Each paper had two area chairs (sometimes called “meta-reviewers” in CMT) assigned to it, one primary, one secondary, by running two rounds of assignments (so that the primary was usually the “better” match). One reviewer per paper was also assigned automatically by CMT in a similar fashion. CMT provides proper load balancing, so that all area chairs and reviewers had similar loads.
Manual Checks of the Automated Assignments
Before finalizing the automated assignment, we manually looked through the list of papers to fix any potential problems that were not handled by the automated process. The two major cases were papers that did not go through the TMS system (authors did not agree to do so), and cases of poor primary-secondary meta-reviewer pairs (when the two area chairs are judged to be too close to offer independent assessment, e.g. working at the same institution, previous supervisor-student relationship).
Second and Third Reviewer Assignment
Once the initial assignments were announced, we asked the two area chairs for a given paper to each manually assign another reviewer from the PC. To help area chairs with this, we generated a shortlist of 10 recommended reviewers for each paper (using the estimated bid matrix and TMS score, with the CMT matching algorithm for load balancing of reviewer suggestions.) Area chairs were free to either use this list, or select from the complete program committee, or alternately, they could seek an outside reviewer which was then added to the PC, an option used 80 times. The load for each reviewer was restricted to at most 7 papers with exceptions when they agreed explicitly to more.
Most papers received at least 3 full reviews in the first round. Reviewers could not see each others’ reviews until they submitted their own. ML-Journaled submissions (see double submission guide) were reviewed only by two area chairs. In a small number of regular submissions (less than 10), we received 2 very negative reviews and notified the third reviewer (who was usually late by this point!) that we would not need their review.
Authors were given a chance to respond to the reviews during a short feedback period. This is becoming a standard practice in machine learning conferences. Authors were also allowed to upload a new version of the paper. The motivation here is that in some cases, it is easier to show the changes directly in the paper, rather than discuss them separately.
Our analysis shows that authors’ responses and subsequent discussions by reviewers made significant changes to the scoring of papers. A total of ~35% of the papers had some change in their scores after the author feedback. The average score for ~50% of the papers went down, stayed the same for ~10%, and went up for the other ~40%. The variance on the scores decreased by ~20%, indicating some convergence in the decisions.
To help us better decide on the quality of the papers, we asked the primary area chairs to provide a meta-review for each of their papers. For papers without unanimous review decisions (i.e. some reviews wanted to accept and some wanted to reject), we asked the secondary area chair to (independently) fill-in a meta-review, recommending whether to accept or reject the paper. A total of 1214 meta-reviews were provided. There were also 20 papers for which a 4th review was added in this period.
In all cases where the primary and secondary area chairs disagreed on the decision, the program chairs were directly involved, reviewing all the evidence (reviews, rebuttal, discussion, often the paper itself), and entering in a discussion (usually via email) with the area chairs, until a unanimous decision was achieved.
A total of 243 papers (27% of submissions) were accepted. Author notifications were sent out on April 30.
May 16 in Cambridge, is the New England Machine Learning Day, a first regional workshop/symposium on machine learning. To present a poster, submit an abstract by May 5.
as of last night, late.
When the reviewing deadline passed Wednesday night 15% of reviews were still missing, much higher than I expected. Between late reviews coming in, ACs working overtime through the weekend, and people willing to help in the pinch another ~390 reviews came in, reducing the missing mass to 0.2%. Nailing that last bit and a similar quantity of papers with uniformly low confidence reviews is what remains to be done in terms of basic reviews. We are trying to make all of those happen this week so authors have some chance to respond.
I was surprised by the quantity of late reviews, and I think that’s an area where ICML needs to improve in future years. Good reviews are not done in a rush—they are done by setting aside time (like an afternoon), and carefully reading the paper while thinking about implications. Many reviewers do this well but a significant minority aren’t good at scheduling their personal time. In this situation there are several ways to fail:
- Give early warning and bail.
- Give no warning and finish not-too-late.
- Give no warning and don’t finish.
The worst failure mode by far is the last one for Program Chairs and Area Chairs, because they must catch and fix all the failures at the last minute. I expect the second failure mode also impacts the quality of reviews because high speed reviewing of a deep paper often doesn’t work. This issue is one of community norms which can only be adjusted slowly. To do this, we’re going to pass a flake list for failure mode 3 to future program chairs who will hopefully further encourage people to schedule time well and review carefully.
If my experience is any guide, plenty of authors will feel disappointed by the reviews. Part of this is simply because it’s the first time the authors have had contact with people not biased towards agreeing with them, as almost all friends are. Part of this is the significant hurdle of communicating technical new things well. Part may be too-hasty reviews, as discussed above. And part of it may be that the authors simply are far more expert in their subject than reviewers.
In author responses, my personal tendency is to be blunter than most people when reviewers make errors. Perhaps “kind but clear” is a good viewpoint. You should be sympathetic to reviewers who have voluntarily put significant time into reviewing your paper, but you should also use the channel to communicate real information. Remotivating your paper almost never works, so concentrate on getting across errors in understanding by reviewers or answer their direct questions.
We did not include reviewer scores in author feedback, although we do plan to include them when the decision is made. Scores should not be regarded as final by any party, since author feedback and discussion can significantly alter a reviewer’s understanding of the paper. Encouraging reviewers to incorporate this additional information well before settling on a final score is one of my goals.
We did allow resubmission of the paper with the author response, similar to what Geoff Gordon did as program chair for AIStat. This solves two problems: It helps authors create a more polished draft, and it avoids forcing an overly constrained channel in the communication. If an equation has a bug, you can write it out bug free in mathematical notation rather than trying to describe by reference how to alter the equation in author response.
Please comment if you have further thoughts.
has died. He lived a full life. I know him personally as a founder of the Center for Computational Learning Systems and the New York Machine Learning Symposium, both of which have sheltered and promoted the advancement of machine learning. I expect much of the New York area machine learning community will miss him, as well as many others around the world.
Sasha is the open problems chair for both COLT and ICML. Open problems will be presented in a joint session in the evening of the COLT/ICML overlap day. COLT has a history of open sessions, but this is new for ICML. If you have a difficult theoretically definable problem in machine learning, consider submitting it for review, due March 16. You’ll benefit three ways:
- The effort of writing down a precise formulation of what you want often helps you understand the nature of the problem.
- Your problem will be officially published and citable.
- You might have it solved by some very intelligent bored people.
The general idea could easily be applied to any problem which can be crisply stated with an easily verifiable solution, and we may consider expanding this in later years, but for this year all problems need to be of a theoretical variety.
Joelle and I (and Mahdi, and Laurent) finished an initial assignment of Program Committee and Area Chairs to papers. We’ll be updating instructions for the PC and ACs as we field questions. Feel free to comment here on things of plausible general interest, but email us directly with specific concerns.
For graduate students, the Yahoo! Key Scientific Challenges program including in machine learning is on again, due March 9. The application is easy and the $5K award is high quality “no strings attached” funding. Consider submitting.
The ICML paper deadline has passed. Joelle and I were surprised to see the number of submissions jump from last year by about 50% to around 900 submissions. A tiny portion of these are immediate rejects(*), so this is a much larger set of papers than expected. The number of workshop submissions also doubled compared to last year, so ICML may grow significantly this year, if we can manage to handle the load well. The prospect of making 900 good decisions is fundamentally daunting, and success will rely heavily on the program committee and area chairs at this point.
For those who want to rubberneck a bit more, here’s a breakdown of submissions by primary topic of submitted papers:
66 Reinforcement Learning 52 Supervised Learning 51 Clustering 46 Kernel Methods 40 Optimization Algorithms 39 Feature Selection and Dimensionality Reduction 33 Learning Theory 33 Graphical Models 33 Applications 29 Probabilistic Models 29 NN & Deep Learning 26 Transfer and Multi-Task Learning 25 Online Learning 25 Active Learning 22 Semi-Supervised Learning 20 Statistical Methods 20 Sparsity and Compressed Sensing 19 Ensemble Methods 18 Structured Output Prediction 18 Recommendation and Matrix Factorization 18 Latent-Variable Models and Topic Models 17 Graph-Based Learning Methods 16 Nonparametric Bayesian Inference 15 Unsupervised Learning and Outlier Detection 12 Gaussian Processes 11 Ranking and Preference Learning 11 Large-Scale Learning 9 Vision 9 Social Network Analysis 9 Multi-agent & Cooperative Learning 9 Manifold Learning 8 Time-Series Analysis 8 Large-Margin Methods 8 Cost Sensitive Learning 7 Recommender Systems 7 Privacy, Anonymity, and Security 7 Neural Networks 7 Empirical Insights into ML 7 Bioinformatics 6 Information Retrieval 6 Evaluation Methodology <5 each Text Mining, Rule and Decision Tree Learning, Graph Mining, Planning & Control, Monte Carlo Methods, Inductive Logic Programming & Relational Learning, Causal Inference, Statistical and Relational Learning, NLP, Hidden Markov Models, Game Theory, Robotics, POMDPs, Geometric Approaches, Game Playing, Data Streams, Pattern Mining & Inductive Querying, Meta-Learning, Evolutionary Computation
(*) Deadlines are magical, because they galvanize groups of people to concentrated action. But, they have to be real deadlines to achieve this, which leads us to reject late submissions & format failures to keep the deadline real for future ICMLs. This is uncomfortably rough at times.
The From Data to Knowledge workshop May 7-11 at Berkeley should be of interest to the many people encountering streaming data in different disciplines. It’s run by a group of astronomers who encounter streaming data all the time. I met Josh Bloom recently and he is broadly interested in a workshop covering all aspects of Machine Learning on streaming data. The hope here is that techniques developed in one area turn out useful in another which seems quite plausible. Particularly if you are in the bay area, consider checking it out.
It also seems worthwhile to give some sense of the scope and reviewing criteria for ICML for authors considering submitting papers. At ICML, the (very large) program committee does the reviewing which informs final decisions by area chairs on most papers. Program chairs setup the process, deal with exceptions or disagreements, and provide advice for the reviewing process. Providing advice is tricky (and easily misleading) because a conference is a community, and in the end the aggregate interests of the community determine the conference. Nevertheless, as a program chair this year it seems worthwhile to state the overall philosophy I have and what I plan to encourage (and occasionally discourage).
At the highest level, I believe ICML exists to further research into machine learning, which I generally think of as turning observations into useful predictions. Research is greatly varied in general, but in all cases it involves answering an interesting question for which the answer was not previously known. Interesting questions are generally natural: they can be stated easily and other people plausibly encounter them. Interesting questions are generally also ones for which there are multiple plausible wrong answers. The definition of “interesting” is otherwise hard to pin down, because it is does and must change over time.
ICML is a broad conference which incorporates the interests of many different groups of people with different tastes in the research they prefer. It’s broad enough that most people don’t appreciate all the papers. That’s ok as long as there is some higher level appreciation for which directions of research benefit the community. Some common flavors are:
- ML for X In general, Machine Learning is a core field of study with many applications. Often, it’s a good idea to publish within a conference focused on that area, but particularly when no such conference exists, ICML is a solid choice for a place to publish. One example of this kind of thing is Machine Learning for Sustainability, where the CCC will be giving a few travel grants. Here the core question is typically “How?” Exhibiting new things that you can do with ML provides good reference points for what is possible, provides a sense of what works, and compelling new ideas about what to work on can be valuable to the community.
There are several ways that papers of this sort can bounce. Perhaps X is insufficiently interesting, the results are unconvincing, or the method of solution is considered too straight-forward. I consider the first and second criteria sound, but am inclined toward leniency on the third, since there is often quite a bit of work in figuring out how to frame the problem so that the solution happens to be easy.
- New Algorithms Often, authors find that existing learning algorithms for solving some problem are lacking in some way, so they propose new better algorithms. This is plausibly the most common category of paper at ICML, so there is quite a bit of variety. The most straight-forward version proposes a new algorithm for a well-studied problem. For these papers it’s important to have an empirical comparison to existing baselines.
It’s easy for an empirical comparison to go wrong. Some authors use synthetic datasets which do not seem significant to me, because good results on such datasets may not transfer to real-world problems well as the real world tends to be quite a bit more complex than the synthetic processes which are natural to program. Instead, it’s important to show good results on real datasets. One problem with relying on real datasets is dataset selection—choosing the dataset for which your algorithm seems to perform best. You can avoid this by choosing datasets in some clearly unbiased manner and by evaluating on many standard datasets. Another way to fail is with a poor choice of baseline. This is tricky, because three reviewers might consider three different baselines the most natural one. Asking around a bit when developing the paper might help here, but in the end this can be a tough judgement call: Is the paper convincing enough that people interested in solving the problem should use this algorithm?
Another class of new algorithms papers is new algorithms for new areas of machine learning, blending into the previous category. Here, there typically are relatively few (perhaps just one) dataset available and there may be no (or only implausibly bad) baselines. For papers like this, one way I’ve seen difficulties is when authors are very invested in a particular approach to solving the problem. If you have defined the problem too narrowly, broadening the definition of the problem can help you see appropriate baselines. Another difficulty I’ve observed is reviewers used to the well-studied problems reject an interesting paper because (essentially) they assume that the authors left out a good baseline which does not exist. To prevent the first, authors who ask around might get some valuable early feedback. For the second, it’s a difficulty we are aware of and will consider asking reviewers to judge on the merits of ML for X.
- Algorithmic studies A relatively rare but potentially valuable form of paper is an algorithmic study. Here, the authors do not propose a new algorithm, but instead do a comprehensive empirical comparison of different algorithms. The standards here are quite high—the empirical comparison needs to be first-class to convince people, so the empirical comparison comments under new algorithms apply strongly.
- New Theory Good theory can enlighten us about what is (or might be) possible. It can also help us build robust learning algorithms, where we design learning algorithms so that they provably solve some large class of problems. I am personally most interested in theory that helps us design new learning algorithms, but broadly interested in what is possible. I’m most interested in the question answered, while the means (and language) should only be as complex as necessary so the theory can be understood as widely as possible.
In many areas of CS theory, double blind reviewing is rare, so theory-oriented people may be unfamiliar with it. An important consequence is that complete proofs must be included either in the paper or supplemental material so that proof checking is fully feasible.
Another way that I’ve seen theory papers run into trouble is when it is a post-hoc justification for an algorithm. In essence, authors who choose to analyze an existing algorithm are sometimes forced to make many unnatural assumptions for the theory to be correct. There generally isn’t an easy fix if you arrive at this point.
- n of the above It is common for ICML papers to be multicategory. At the extreme, you might have a new algorithm which solves a new X well, empirically and theoretically. Reviewers can fall into a trap where they are most interested in 1 of the 4 questions answered above, and find 1/4 of the paper devoted to their question relatively weak compared to the paper that devotes all the pages to the same question.
We are aware of this, and will encourage it to be taken into account.
- The exception The set of papers I expect to see at ICML is more diverse than the above—there are often exceptions of one sort or another. For these exceptions, it often becomes a judgment call: Does this paper significantly further research into machine learning? Papers with little potential audience probably don’t while fun/interesting/useful things that we didn’t think of do.
Further comments or questions are welcome.