Since John did not attend COLT this year, I have been volunteered to report back on the hot stuff at this year’s meeting. The conference seemed to have pretty high quality stuff this year, and I found plenty of interesting papers on all the three days. I’m gonna pick some of my favorites going through the program in a chronological order.

The first session on matrices seemed interesting for two reasons. First, the papers were quite nice. But more interestingly, this is a topic that has had a lot of presence in Statistics and Compressed sensing literature recently. So it was good to see high-dimensional matrices finally make their entry at COLT. The paper of Ohad and Shai on Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing provides non-trivial guarantees on trace norm regularization in an agnostic setup, while Rina and Nati show how Rademacher averages can be used to get sharper results for matrix completion problems in their paper Concentration-Based Guarantees for Low-Rank Matrix Reconstruction. Both the papers seemed to share the flavor of a learning theorists’ take at compressed sensing that I enjoyed seeing.

The best student paper by Amit, Sivan and Shai^{2} on Multiclass Learnability and the ERM principle showed a crucial distinction between binary and multiclass classification. Every ERM procedure is not *equally good* for multiclass classification. In particular, there are multiclass problems where some ERM learners succeed while others are inconsistent, in sharp contrast to the binary case. They also present some intuition on what characterizes a good ERM procedure for the multiclass setting.

I enjoyed all the three papers in the online learning session quite a bit. Jake, Elad and Peter show the equivalence of Blackwell approachability and low regret in their paper Blackwell Approachability and No-Regret Learning are Equivalent, with applications to efficient algorithms for calibration. Sasha, Karthik and Ambuj won the best paper award for their paper Online Learning: Beyond Regret which shows how the tools like sequential Rademacher averages and sequential covering numbers can be used to capture the minimax value of a large class of games, beyond just external regret settings. Their paper with Dean on Complexity-Based Approach to Calibration with Checking Rules showed a nice application of these techniques to the calibration problem.

The impromptu session was quite a hit this year with ~15 talks. I was quite disappointed to see none of them turn up on my NIPS review stack ðŸ™‚

Rob managed to save some money by solving his open problem from last COLT, together with Indraneel and Cynthia. Their paper The Rate of Convergence of AdaBoost was interesting as it made me realize how much difference the boundedness of rates can make to the theoretical properties of an algorithm. Adaboost is greedy coordinate descent, for which convergence over a compact domain is well-studied, but what makes this challenging here is that Adaboost doesn’t impose any bound on the weights. The way this paper gets around and pays a penalty for these issues seemed quite interesting.

I also liked the paper by Gabor, David and Csaba on Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments. This paper shows a tight characterization of partial regret games that have an optimal regret of 0, T^{1/2}, T^{2/3} or T. While some sufficient conditions one way or the other were known before, theirs is a first complete characterization in my knowledge.

Overall, this was a thoroughly enjoyable COLT, both in the technical content and in the choice of venue.

Nice to see some high-dimensional chest making its way to this blog. It’s pretty clear that COLT is making up for the lack of innovative new ideas in Annals of Statistics and the like.

It strikes me as funny that you see COLT papers borrowing the problem setting from and building on ideas from the statistics literature as a lack of innovation in statistics. Let me also remark that the fast 1/n style convergence rates that are shown under the frowned-upon linear model assumption have so far remained elusive without the assumption.

Alekh

Whether the 1/n convergence propertah is worth dropping down to linear models is trade-off that needs further investigation. It’s alsa beneficial to the community to approach things from multiple perspectives, so as to ward off the inevitable groupthink-based fallacies.