- Thomas Walsh, Kaushik Subramanian, Michael Littman and Carlos Diuk Generalizing Apprenticeship Learning across Hypothesis Classes. This paper formalizes and provides algorithms with guarantees for mixed-mode apprenticeship and traditional reinforcement learning algorithms, allowing RL algorithms that perform better than for either setting alone.
- István Szita and Csaba Szepesvári Model-based reinforcement learning with nearly tight exploration complexity bounds. This paper and anotherrepresent the frontier of best-known algorithm for Reinforcement Learning in a Markov Decision Process.
- James Martens Deep learning via Hessian-free optimization. About a new not-quite-online second order gradient algorithm for learning deep functional structures. Potentially this is very powerful because while people have often talked about end-to-end learning, it has rarely worked in practice.
- Chrisoph Sawade, Niels Landwehr, Steffen Bickel. and Tobias Scheffer Active Risk Estimation. When a test set is not known in advance, the model can be used to safely aid test set evaluation using importance weighting techniques. Relative to the paper, placing a lower bound on p(y|x) is probably important in practice.
- H. Brendan McMahan and Matthew Streeter Adaptive Bound Optimization for Online Convex Optimization and the almost-same paper John Duchi, Elad Hazan, and Yoram Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. These papers provide tractable online algorithms with regret guarantees over a family of metrics rather than just euclidean metrics. They look pretty useful in practice.
- Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella, Active Learning on Trees and Graphs Various subsets of these authors have other papers about actively learning graph-obeying functions which in total provide a good basis for understanding what’s possible and how to learn.
The program chairs for ICML did a wide-ranging survey over participants. The results seem to suggest that participants generally agree with the current ICML process. I expect there is some amount of anchoring effect going on where participants have an apparent preference for the known status quo, although it’s difficult to judge the degree of that. Some survey results which aren’t of that sort are:
- 7.7% of reviewers say author feedback changed their mind. It would be interesting to know for which fraction of accepted papers reviewers had their mind changed, but that isn’t there.
- 85.4% of authors don’t know if the reviewers read their response, believe they read and ignored it, or believe they didn’t read it. Authors clearly don’t feel like they are communicating with reviewers.
- 58.6% support growing the conference with the largest fraction suggesting poster-only papers.
- Other conferences attended by the ICML community in order are NIPS, ECML/PKDD, AAAI, IJCAI, AIStats, UAI, KDD, ICDM, COLT, SIGIR, ECAI, EMNLP, CoNLL. This is pretty different from the standard colocation list for ICML. Many possibilities are precluded by scheduling, but AAAI, IJCAI, UAI, KDD, COLT, SIGIR are all serious possibilities some of which haven’t been used much in the past.
My experience with Mark‘s new paper discussion site is generally positive—having comments emailed to interested parties really helps the discussion. There are a few comments that authors haven’t responded to, so if you are an author you might want to sign up to receive comments.
In addition, I was the workshop chair for ICML&COLT this year. My overall impression was that things went reasonably well, with the exception of internet connectivity at Dan Panorama which was a minidisaster courtesy of a broken per-machine authentication system. One of the things I’m particularly happy about was the Learning to Rank Challenge workshop. I think it would be great if ICML can continue to attract new challenge workshops in the future. If anyone else has comments about the workshops, I’d love to hear them.
The second Netflix prize is canceled due to privacy problems. I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release.
I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.
Yahoo! is sponsoring two machine learning events that might interest people.
- The Key Scientific Challenges program (due March 5) for Machine Learning and Statistics offers $5K (plus bonuses) for graduate students working on a core problem of interest to Y! If you are already working on one of these problems, there is no reason not to submit, and if you aren’t you might want to think about it for next year, as I am confident they all press the boundary of the possible in Machine Learning. There are 7 days left.
- The Learning to Rank challenge (due May 31) offers an $8K first prize for the best ranking algorithm on a real (and really used) dataset for search ranking, with presentations at an ICML workshop. Unlike the Netflix competition, there are prizes for 2nd, 3rd, and 4th place, perhaps avoiding the heartbreak the ensemble encountered. If you think you know how to rank, you should give it a try, and we might all learn something. There are 3 months left.
I attended the Netflix prize ceremony this morning. The press conference part is covered fine elsewhere, with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes, because they were tied in performance on the ultimate holdout set. I’m sure the individual participants will have many chances to speak about the solution. One of these is Bell at the NYAS ML symposium on Nov. 6.
Several additional details may interest ML people.
- The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at .02 to .03%.
- A tie was possible, because the rules cut off measurements below the fourth digit based on significance concerns. In actuality, of course, the scores do differ before rounding, but everyone I spoke to claimed not to know how. The complete dataset has been released on UCI, so each team could compute their own score to whatever accuracy desired.
- I was impressed by the slick systematic uses of SVD mentioned in the technical presentation, as implied by the first comment here.
- The amount of programming and time which went into this contest was pretty shocking. I was particularly impressed with the amount of effort that went into various techniques for blending results from different systems. In this respect, the lack of release of the source code is a little bit disappointing.
- I forgot to ask explicitly, but no one mentioned using any joins of the data to external databases. That’s somewhat surprising if you think about it given how much other information is available about movies.
- I hadn’t previously convexity functioning as a tool for social engineering so explicitly. Because squared loss is convex, any two different solutions of similar performance can be linearly blended to yield a mixed solution of superior performance. The implications of this observation were on display.
Netflix also announced a plan for a new contest, which will focus on using features of users, and predicting well for the (presumably large number of) users who rate very few movies. I hope they get the anonymization on this data right, as it’s obviously important.
This brings up a basic issue: How should a contest be designed? In the main, the finished Netflix contest seems to have been well designed. For example, the double holdout set approach nicely prevents overfitting, which has been a bugaboo of some previous contests. One improvement they are already implementing is asymptopia removal—the contest will award $0.5M in 6 months, and $0.5M more in 18 months. Nevertheless, we might imagine better contests, and perhaps it’s worthwhile to do so given the amount of attention devoted.
- Metric One criticism is that squared loss does not very directly reflect the actual value to Netflix of a particular set of recommendations. This seems like a fair criticism, although if you believe ranking according to the optimal expected conditional ratings is the best possible, it is at least consistent. The degree to which suboptimal squared loss prediction controls suboptimality of a recommendation loss is weak, but it should kick in when squared loss is deeply optimized as happened in this contest.
What you really want is something like “Did the user pick the recommended movie?” This would provide a qualitative leap in the fidelity of the metric to the true underlying problem. Unfortunately, doing this properly is difficult, as you need to cope with exploration issues, which must be done at the time of data collection. So my basic take is that the squared loss metric seems “ok”, with the proviso that it could be done better if you start the data collection with some amount of random exploration.
- Prize distribution In a race as tight as this one, it must feel pretty rough for the members of The Ensemble to put so much effort in and then win nothing. A good case can be made that this isn’t optimal design for a contest where we are trying to learn new things. For example, it seems quite plausible that there was some interesting technique used in The Ensemble yet not used by the winner. A case can also be made based on online learning with experts theory, which generally says that the right way to reward a stable of experts is via an exponential weighting scheme. This essentially corresponds to having a “softmax” prize distribution where the distribution to a participant p is according to e-C(winner – p) where C is a problem dependent constant. This introduces the possibility of a sybil attack, but that appears acceptably controllable, especially if the prize distribution is limited to the top few participants.
- Source Code After the Netflix prize was going for some time, the programming-time complexity of entering the contest became very formidable. The use of convex loss function and requiring participants to publish helped some with this, but it remained quite difficult. If the contest required the release of source code as well, I could imagine both lowering the barrier to late entry, and helping advance the field a bit more. Of course, it’s hard to go halfway with this—if you really want to guarantee that the source code works, you need to make the information exchange interface be the source code itself (which is then compiled and run in a sandbox), rather than labels.
One last question to consider is: Is it good for the research community to have contests? My general belief on this is a definite “yes”, as it gives people who know how to do things a chance to distinguish themselves. For the Netflix contest in particular, the contest has educated me a bit about ensemble and SVD-style techniques, and I’m sure it’s generally helped crystallize out a set of applicable ML technologies for many people, which I expect to see widely used elsewhere in the future.