2011 Summer Conference Deadline Season

Machine learning always welcomes the new year with paper deadlines for summer conferences. This year, we have:

Conference Paper Deadline When/Where Double blind? Author Feedback? Notes
ICML February 1 June 28-July 2, Bellevue, Washington, USA Y Y Weak colocation with ACL
COLT February 11 July 9-July 11, Budapest, Hungary N N colocated with FOCM
KDD February 11/18 August 21-24, San Diego, California, USA N N
UAI March 18 July 14-17, Barcelona, Spain Y N

The larger conferences are on the west coast in the United States, while the smaller ones are in Europe.

NIPS 2010

I enjoyed attending NIPS this year, with several things interesting me. For the conference itself:

  1. Peter Welinder, Steve Branson, Serge Belongie, and Pietro Perona, The Multidimensional Wisdom of Crowds. This paper is about using mechanical turk to get label information, with results superior to a majority vote approach.
  2. David McAllester, Tamir Hazan, and Joseph Keshet Direct Loss Minimization for Structured Prediction. This is about another technique for directly optimizing the loss in structured prediction, with an application to speech recognition.
  3. Mohammad Saberian and Nuno Vasconcelos Boosting Classifier Cascades. This is about an algorithm for simultaneously optimizing loss and computation in a classifier cascade construction. There were several other papers on cascades which are worth looking at if interested.
  4. Alan Fern and Prasad Tadepalli, A Computational Decision Theory for Interactive Assistants. This paper carves out some forms of natural not-MDP problems and shows their RL-style solution is tractable. It’s good to see people moving beyond MDPs, which at this point are both well understood and limited.
  5. Oliver Williams and Frank McSherry Probabilistic Inference and Differential Privacy. This paper is about a natural and relatively unexplored, and potentially dominating approach for achieving differential privacy and learning.

I also attended two workshops—Coarse-To-Fine and LCCC which were a fine combination. The first was about more efficient (and sometimes more effective) methods for learning which start with coarse information and refine, while the second was about parallelization and distribution of learning algorithms. Together, they were about how to learn fast and effective solutions.

The CtF workshop could have been named “Integrating breadth first search and learning”. I was somewhat (I hope not too) pesky, discussing Searn repeatedly during questions, since it seems quite plausible that a good application of Searn would compete with and plausibly improve on results from several of the talks. Eventually, I hope the conventional wisdom shifts to a belief that search and learning must be integrated for efficiency and robustness reasons. The talks in this workshop were uniformly strong in making that case. I was particularly interested in Drew‘s talk on a plausible improvement on Searn.

The level of agreement in approaches at the LCCC workshop was much lower, with people discussing many radically different approaches.

  1. Should data be organized by feature partition or example partition? Fernando points out that features often scale sublinearly in the number of examples, implying that an example partition addresses scale better. However, basic learning theory tells us that if the number of parameters scales sublinearly in the number of examples, then the value of additional samples asymptotes, implying a mismatched solution design. My experience is that a ‘not enough features’ problem can be dealt with by throwing all the missing features you couldn’t properly previously use, for example personalization.
  2. How can we best leverage existing robust distributed filesystem/MapReduce frameworks? There was near unanimity on the belief that MapReduce itself is of limited value for machine learning, but the step forward is unclear. I liked what Markus said: that no one wants to abandon the ideas of robustly storing data and moving small amounts of code to large amounts of data. The best way to leverage this capability to build great algorithms remains unclear to me.
  3. Every speaker was in agreement that their approach was faster, but there was great disagreement about what “fast” meant in an absolute sense. This forced me to think about an absolute measure of (input complexity)/(time) where we see results between 100 features/s and 10*106 features/s being considered “fast” depending on who is speaking. This scale disparity is remarkably extreme. A related detail is that the strength of baseline algorithms varies greatly.

I hope we’ll discover convincing answers to these questions in the near future.

To Vidoelecture or not

(update: cross-posted on CACM)

For the first time in several years, ICML 2010 did not have videolectures attending. Luckily, the tutorial on exploration and learning which Alina and I put together can be viewed, since we also presented at KDD 2010, which included videolecture support.

ICML didn’t cover the cost of a videolecture, because PASCAL didn’t provide a grant for it this year. On the other hand, KDD covered it out of registration costs. The cost of videolectures isn’t cheap. For a workshop the baseline quote we have is 270 euro per hour, plus a similar cost for the cameraman’s travel and accomodation. This can be reduced substantially by having a volunteer with a camera handle the cameraman duties, uploading the video and slides to be processed for a quoted 216 euro per hour.

Youtube is the most predominant free video site with a cost of $0, but it turns out to be a poor alternative. 15 minute upload limits do not match typical talk lengths. Videolectures also have side-by-side synchronized slides & video which allows quick navigation of the videostream and acceptable resolution of typical talk slides. Overall, these benefits are substantial enough that youtube is not presently a serious alternative.

So, if we can’t avoid paying the cost, is it worthwhile? One way to judge this is by comparing how much authors currently spend traveling to a conference and presenting research vs. the size of the audience. In general, costs vary wildly, but for a typical academic international conference, airfare, hotel, and registration are commonly at least $1000 even after scrimping. The sizes of audiences also varies substantially, but something in the 30-100 range is a typical average. For KDD 2010, the average number of views per presentation is 14.6, but this is misleadingly low, as KDD presentations were just put up. A better number is for KDD 2009, where the average view number is presently 74.2. This number is representative with ICML 2009 presently averaging 115.8. We can argue about the relative merits of online vs. in-person viewing, but the order of their value is at least unclear, since in an online system people specifically seek out lectures to view while at the conference itself people are often opportunistic viewers. Valuing these equally, we see that videolectures increases the size of the audience, and (hence) the value to authors by perhaps a factor of 2 for a cost around 1/3 of current presentation costs.

This conclusion is conservative, because a videolecture is almost surely viewed over more than a year, cost of conference attendance are often higher, and the cost in terms of a presenter’s time is not accounted for. Overall, videolecture coverage seems quite worthwhile. Since authors also typically are the attendees of a conference, increasing the registration fees to cover the cost of videolectures seems reasonable. A videolecture is simply a new publishing format.

We can hope that the price will drop over time, as it’s not clear to me that the 216 euros/hour reflects the real costs of videolectures.net. Some competition of a similar quality would be the surest way to do that. But in the near future, there are two categories of conferences—those that judge the value of their content above 216 euros/hour, and those that do not. Whether or not a conference has videolecture support substantially impacts its desirability as a place to send papers.

New York Area Machine Learning Events

On Sept 21, there is another machine learning meetup where I’ll be speaking. Although the topic is contextual bandits, I think of it as “the future of machine learning”. In particular, it’s all about how to learn in an interactive environment, such as for ad display, trading, news recommendation, etc…

On Sept 24, abstracts for the New York Machine Learning Symposium are due. This is the largest Machine Learning event in the area, so it’s a great way to have a conversation with other people.

On Oct 22, the NY ML Symposium actually happens. This year, we are expanding the spotlights, and trying to have more time for posters. In addition, we have a strong set of invited speakers: David Blei, Sanjoy Dasgupta, Tommi Jaakkola, and Yann LeCun. After the meeting, a late hackNY related event is planned where students and startups can meet.

I’d also like to point out the related CS/Econ symposium as I have interests there as well.