On Sept 21, there is another machine learning meetup where I’ll be speaking. Although the topic is contextual bandits, I think of it as “the future of machine learning”. In particular, it’s all about how to learn in an interactive environment, such as for ad display, trading, news recommendation, etc…
On Sept 24, abstracts for the New York Machine Learning Symposium are due. This is the largest Machine Learning event in the area, so it’s a great way to have a conversation with other people.
On Oct 22, the NY ML Symposium actually happens. This year, we are expanding the spotlights, and trying to have more time for posters. In addition, we have a strong set of invited speakers: David Blei, Sanjoy Dasgupta, Tommi Jaakkola, and Yann LeCun. After the meeting, a late hackNY related event is planned where students and startups can meet.
I’d also like to point out the related CS/Econ symposium as I have interests there as well.
Alekh, John, Ofer, and I are organizing a workshop at NIPS this year on learning in parallel and distributed environments. The general interest level in parallel learning seems to be growing rapidly, so I expect quite a bit of attendance. Please join us if you are parallel-interested.
And, if you are working in the area of parallel learning, please consider submitting an abstract due Oct. 17 for presentation at the workshop.
The papers which interested me most at ICML and COLT 2010 were:
- Thomas Walsh, Kaushik Subramanian, Michael Littman and Carlos Diuk Generalizing Apprenticeship Learning across Hypothesis Classes. This paper formalizes and provides algorithms with guarantees for mixed-mode apprenticeship and traditional reinforcement learning algorithms, allowing RL algorithms that perform better than for either setting alone.
- István Szita and Csaba Szepesvári Model-based reinforcement learning with nearly tight exploration complexity bounds. This paper and anotherrepresent the frontier of best-known algorithm for Reinforcement Learning in a Markov Decision Process.
- James Martens Deep learning via Hessian-free optimization. About a new not-quite-online second order gradient algorithm for learning deep functional structures. Potentially this is very powerful because while people have often talked about end-to-end learning, it has rarely worked in practice.
- Chrisoph Sawade, Niels Landwehr, Steffen Bickel. and Tobias Scheffer Active Risk Estimation. When a test set is not known in advance, the model can be used to safely aid test set evaluation using importance weighting techniques. Relative to the paper, placing a lower bound on p(y|x) is probably important in practice.
- H. Brendan McMahan and Matthew Streeter Adaptive Bound Optimization for Online Convex Optimization and the almost-same paper John Duchi, Elad Hazan, and Yoram Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. These papers provide tractable online algorithms with regret guarantees over a family of metrics rather than just euclidean metrics. They look pretty useful in practice.
- Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella, Active Learning on Trees and Graphs Various subsets of these authors have other papers about actively learning graph-obeying functions which in total provide a good basis for understanding what’s possible and how to learn.
The program chairs for ICML did a wide-ranging survey over participants. The results seem to suggest that participants generally agree with the current ICML process. I expect there is some amount of anchoring effect going on where participants have an apparent preference for the known status quo, although it’s difficult to judge the degree of that. Some survey results which aren’t of that sort are:
- 7.7% of reviewers say author feedback changed their mind. It would be interesting to know for which fraction of accepted papers reviewers had their mind changed, but that isn’t there.
- 85.4% of authors don’t know if the reviewers read their response, believe they read and ignored it, or believe they didn’t read it. Authors clearly don’t feel like they are communicating with reviewers.
- 58.6% support growing the conference with the largest fraction suggesting poster-only papers.
- Other conferences attended by the ICML community in order are NIPS, ECML/PKDD, AAAI, IJCAI, AIStats, UAI, KDD, ICDM, COLT, SIGIR, ECAI, EMNLP, CoNLL. This is pretty different from the standard colocation list for ICML. Many possibilities are precluded by scheduling, but AAAI, IJCAI, UAI, KDD, COLT, SIGIR are all serious possibilities some of which haven’t been used much in the past.
My experience with Mark‘s new paper discussion site is generally positive—having comments emailed to interested parties really helps the discussion. There are a few comments that authors haven’t responded to, so if you are an author you might want to sign up to receive comments.
In addition, I was the workshop chair for ICML&COLT this year. My overall impression was that things went reasonably well, with the exception of internet connectivity at Dan Panorama which was a minidisaster courtesy of a broken per-machine authentication system. One of the things I’m particularly happy about was the Learning to Rank Challenge workshop. I think it would be great if ICML can continue to attract new challenge workshops in the future. If anyone else has comments about the workshops, I’d love to hear them.
I’m the workshops chair for ICML this year. As such, I would like to personally encourage people to consider running a workshop.
My general view of workshops is that they are excellent as opportunities to discuss and develop research directions—some of my best work has come from collaborations at workshops and several workshops have substantially altered my thinking about various problems. My experience running workshops is that setting them up and making them fly often appears much harder than it actually is, and the workshops often come off much better than expected in the end. Submissions are due January 18, two weeks before papers.
Similarly, Ben Taskar is looking for good tutorials, which is complementary. Workshops are about exploring a subject, while a tutorial is about distilling it down into an easily taught essence, a vital part of the research process. Tutorials are due February 13, two weeks after papers.
The NYAS ML symposium grew again this year to 170 participants, despite the need to outsmart or otherwise tunnel through a crowd.
Perhaps the most distinct talk was by Bob Bell on various aspects of the Netflix prize competition. I also enjoyed several student posters including Matt Hoffman‘s cool examples of blind source separation for music.
I’m somewhat surprised how much the workshop has grown, as it is now comparable in size to a small conference, although in style more similar to a workshop. At some point as an event grows, it becomes owned by the community rather than the organizers, so if anyone has suggestions on improving it, speak up and be heard.
There are at least 3 summer schools related to machine learning this summer.
- The first is at University of Chicago June 1-11 organized by Misha Belkin, Partha Niyogi, and Steve Smale. Registration is closed for this one, meaning they met their capacity limit. The format is essentially an extended Tutorial/Workshop. I was particularly interested to see Valiant amongst the speakers. I’m also presenting Saturday June 6, on logarithmic time prediction.
- Praveen Srinivasan points out the second at Peking University in Beijing, China, July 20-27. This one differs substantially, as it is about vision, machine learning, and their intersection. The deadline for applications is June 10 or 15. This is also another example of the growth of research in China, with active support from NSF.
- The third one is at Cambridge, England, August 29-September 10. It’s in the MLSS series. Compared to the Chicago one, this one is more about the Bayesian side of ML, although effort has been made to create a good cross section of topics. It’s also more focused on tutorials over workshop-style talks.
Here are a few of presentations interesting me at the snowbird learning workshop (which, amusingly, was in Florida with AIStat).
- Thomas Breuel described machine learning problems within OCR and an open source OCR software/research platform with modular learning components as well has a 60Million size dataset derived from Google‘s scanned books.
- Kristen Grauman and Fei-Fei Li discussed using active learning with different cost labels and large datasets for image ontology. Both of them used Mechanical Turk as a labeling system, which looks to become routine, at least for vision problems.
- Russ Tedrake discussed using machine learning for control, with a basic claim that it was the way to go for problems involving a medium Reynold’s number such as in bird flight, where simulation is extremely intense.
- Yann LeCun presented a poster on an FPGA for convolutional neural networks yielding a factor of 100 speedup in processing. In addition to the graphics processor approach Rajat has worked on, this seems like an effective approach to deal with the need to compute many dot products.
I’m not as naturally exuberant as Muthu 2 or David about CS/Econ day, but I believe it and ML day were certainly successful.
At the CS/Econ day, I particularly enjoyed Toumas Sandholm’s talk which showed a commanding depth of understanding and application in automated auctions.
For the machine learning day, I enjoyed several talks and posters (I better, I helped pick them.). What stood out to me was number of people attending: 158 registered, a level qualifying as “scramble to find seats”. My rule of thumb for workshops/conferences is that the number of attendees is often something like the number of submissions. That isn’t the case here, where there were just 4 invited speakers and 30-or-so posters. Presumably, the difference is due to a critical mass of Machine Learning interested people in the area and the ease of their attendance.
Are there other areas where a local Machine Learning day would fly? It’s easy to imagine something working out in the San Francisco bay area and possibly Germany or England.
The basic formula for the ML day is a committee picks a few people to give talks, and posters are invited, with some of them providing short presentations. The CS/Econ day was similar, except they managed to let every submitter do a presentation. Are there tweaks to the format which would improve things?
This workshop asks for insights how far we may/can push the theoretical boundary of using data in the design of learning machines. Can we express our classification rule in terms of the sample, or do we have to stick to a core assumption of classical statistical learning theory, namely that the hypothesis space is to be defined independent from the sample? This workshop is particularly interested in – but not restricted to – the ‘luckiness framework’ and the recently introduced notion of ‘compatibility functions’ in a semi-supervised learning context (more information can be found at http://www.kuleuven.be/wehys).
This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year.
The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. However, when other methods are used this becomes more problematic. This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human.
The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. For me, the posters at the end of the workshop were quite helpful in getting approaches to gel.
Here are some answers to “where do the labels come from?”:
- Simulation Use a simulator (which need not be that good) to predict the cost of various choices and turn that into label information. Ashutosh had some cool demos showing the power of this approach. Gregory also presented a poster which might be viewed this way.
- Agreement A label is a point of agreement. Luis often used an agreement mechanism to induce labels with games. Sham discussed the power of agreement to constrain learning algorithms. Huzefa‘s work on bioprediction can be thought of as partly using agreement with previous structures to simulate the label of a new structure.
- Compilation Labels can be found by compiling one learning problem into another. Mark and I both talked about reductions a bit, which come with some nice formal guarantees.
- Backprop Labels are the signals in generalized backpropagation (David Bradley‘s talk).
Some answers to “where do the data come from” are:
- Everywhere The essential idea is to integrate as many data sources as possible. Rakesh had several algorithms which (in combination) allowed him to use a large number of diverse data sources in a text domain.
- Sparsity A representation is formed by finding a sparse set of basis functions on otherwise totally unlabeled data. Rajat discussed self-taught learning algorithms which achieve this.
- Self-prediction A representation is formed by learning to self-predict a set of raw features. Hal‘s talk covered this idea.
A workshop like this is successful if it informs the questions we ask (and answer) in the future. Some natural questions (some of which were discussed) are:
- What is a natural, sufficient langauge for adding prior information into a learning system? Which languages are insufficient? Shai described a sense in which kernels are insufficient as a language for prior information. Bayesian analysis emphasizes reasoning about the parameters of the model, but the language of examples or maybe label expectations may be more natural.
- What is missing from the above lists? And are the elements of the lists actually distinct?
- How do we modularize? Many of the approaches use problem-specific tricks. That’s to be expected for a direction of research which is just starting, but it’s important to modularize these techniques so they can be repeatedly and easily applied. Achieving modularity in a manner which supports prior information properly seems tricky.
- How do we formalize and analyze? Of the items listed above, I feel like we only have some reasonable understanding of the compilation approach. The other approaches and questions are essentially unexplored territory where some serious thinking may be helpful.
The results have been posted, with CMU first, Stanford second, and Virginia Tech Third.
Considering that this was an open event (at least for people in the US), this was a very strong showing for research at universities (instead of defense contractors, for example). Some details should become public at the NIPS workshops.
Slashdot has a post with many comments.
(Unofficially, at least.) The Deep Learning Workshop is being held the afternoon before the rest of the workshops in Vancouver, BC. Separate registration is needed, and open.
What’s happening fundamentally here is that there are too many interesting workshops to fit into 2 days. Perhaps we can get it officially expanded to 3 days next year.
Alina and I are organizing a workshop on Learning Problem Design at NIPS.
What is learning problem design? It’s about being clever in creating learning problems from otherwise unlabeled data. Read the webpage above for examples.
I want to participate! Email us before Nov. 1 with a description of what you want to talk about.