Deadline Season, 2010

Many conference deadlines are coming soon.

Deadline Double Blind / Author Feedback Time/Place
ICML January 18((workshops) / February 1 (Papers) / February 13 (Tutorials) Y/Y Haifa, Israel, June 21-25
KDD February 1(Workshops) / February 2&5 (Papers) / February 26 (Tutorials & Panels)) / April 17 (Demos) N/S Washington DC, July 25-28
COLT January 18 (Workshops) / February 19 (Papers) N/S Haifa, Israel, June 25-29
UAI March 11 (Papers) N?/Y Catalina Island, California, July 8-11

ICML continues to experiment with the reviewing process, although perhaps less so than last year.

The S “sort-of” for COLT is because author feedback occurs only after decisions are made.

KDD is notable for being the most comprehensive in terms of {Tutorials, Workshops, Challenges, Panels, Papers (two tracks), Demos}. The S for KDD is because there is sometimes author feedback at the decision of the SPC.

The (past) January 18 deadline for workshops at ICML is nominal, as I (as workshop chair) almost missed it myself and we have space for a few more workshops. If anyone is thinking “oops, I missed the deadline”, send in your proposal by Friday the 22nd.

This year, I’m an area chair for ICML and on the SPC for KDD. I hope to see interesting papers on plausibly useful learning theory (broadly interpreted) at each conference, as I did last year.

Sam Roweis died

and I can’t help but remember him.

I first met Sam as an undergraduate at Caltech where he was TA for Hopfield‘s class, and again when I visited Gatsby, when he invited me to visit Toronto, and at too many conferences to recount. His personality was a combination of enthusiastic and thoughtful, with a great ability to phrase a problem so it’s solution must be understood. With respect to my own work, Sam was the one who advised me to make my first tutorial, leading to others, and to other things, all of which I’m grateful to him for. In fact, my every interaction with Sam was positive, and that was his way.

His death is being called a suicide which is so incompatible with my understanding of Sam that it strains my credibility. But we know that his many responsibilities were great, and it is well understood that basically all sane researchers have legions of inner doubts. Having been depressed now and then myself, it’s helpful to understand at least intellectually that the true darkness of the now is overestimated, and that you have more friends than you think. Sam was one of mine, and I’ll miss him.

My last interaction with Sam, last week, was discussing a new research direction that interested him, optimizing the cost of acquiring feature information in the learning algorithm. This problem is endemic to real-world applications, and has been studied to some extent elsewhere, but I expect that in our unwritten future history, we’ll discover that further study of this problem is more helpful than almost anyone realizes. The reply that I owed him feels heavy, and an incompleteness is hanging. For his wife and children it is surely so incomparably greater that I lack words.

(Added) Others: Fernando, Kevin McCurley, Danny Tarlow, David Hogg, Yisong Yue, Lance Fortnow on Sam, a Memorial site, and a Memorial Fund

Edit: removed a news article link by request

Interesting things at NIPS 2009

Several papers at NIPS caught my attention.

  1. Elad Hazan and Satyen Kale, Online Submodular Optimization They define an algorithm for online optimization of submodular functions with regret guarantees. This places submodular optimization roughly on par with online convex optimization as tractable settings for online learning.
  2. Elad Hazan and Satyen Kale On Stochastic and Worst-Case Models of Investing. At it’s core, this is yet another example of modifying worst-case online learning to deal with variance, but the application to financial models is particularly cool and it seems plausibly superior other common approaches for financial modeling.
  3. Mark Palatucci, Dean Pomerlau, Tom Mitchell, and Geoff Hinton Zero Shot Learning with Semantic Output Codes The goal here is predicting a label in a multiclass supervised setting where the label never occurs in the training data. They have some basic analysis and also a nice application to FMRI brain reading.
  4. Shobha Venkataraman, Avrim Blum, Dawn Song, Subhabrata Sen, and Oliver Spatscheck, Tracking Dynamic Sources of Malicious Activity at Internet Scales. This is a plausible combination of worst-case learning algorithms in a tree-like structure over IP space to track and predict bad IPs. Their empirical results look quite good to me and there are many applications where this prediction problem needs to be solved.
  5. Kamalika Chaudhuri, Daniel Hsu, and Yoav Freund, A Parameter Free Hedging Algorithm This paper is about eliminating the learning rate parameter from online learning algorithms. While that’s certainly useful, the approach taken involves a double-exponential rather than a single exponential potential, which is strange and potentially useful in many other places.
  6. Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Polynomial Semantic Indexing This is about an empirically improved algorithm for learning ranking functions based on (query,document) content. The sexy Semantic name is justified because it is not based on syntactic matching of query to document.

I also found the future publication models discussion interesting. The follow-up post here has details and further discussion.

At the workshops, I was deeply confronted with the problem of too many interesting workshops to attend in the given amount of time. Two talks stood out for me:

  1. Carlos Guestrin gave a talk in the interactive machine learning workshop on Turning Down the Noise in the Blogosphere by Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin which I missed at KDD this year. The paper discusses the use exponential weight online learning algorithms to rerank blog posts based on user-specific interests. It comes with a demonstration website where you can test it out.
  2. Leslie Valiant gave a talk on representations and operations on concepts in a brain-like fashion. The style of representation and algorithm involves distributed representations on sparse graphs, an approach which is relatively unfamiliar. Bloom filters and in machine learning experience with learning through hashing functions has sharpened my intuition a bit. The talk seemed to cover Memorization and Association on a Realistic Neural Model at Neural Computation as well as A First Experimental Demonstration of Massive Knowledge Infusion at KR.

Top graduates this season

I would like to point out 3 graduates this season as having my confidence they are capable of doing great things.

  1. Daniel Hsu has diverse papers with diverse coauthors on {active learning, mulitlabeling, temporal learning, …} each covering new algorithms and methods of analysis. He is also a capable programmer, having helped me with some nitty-gritty details of cluster parallel Vowpal Wabbit this summer. He has an excellent tendency to just get things done.
  2. Nicolas Lambert doesn’t nominally work in machine learning, but I’ve found his work in elicitation relevant nevertheless. In essence, elicitable properties are closely related to learnable properties, and the elicitation complexity is related to a notion of learning complexity. See the Surrogate regret bounds paper for some related discussion. Few people successfully work at such a general level that it crosses fields, but he’s one of them.
  3. Yisong Yue is deeply focused on interactive learning, which he has attacked at all levels: theory, algorithm adaptation, programming, and popular description. I’ve seen a relentless multidimensional focus on a new real-world problem be an excellent strategy for research and expect he’ll succeed.

The obvious caveat applies—I don’t know or haven’t fully appreciated everyone’s work so I’m sure I missed people. I’d like to particularly point out Percy Liang and David Sontag as plausibly such whom I’m sure others appreciate a great deal.

Inherent Uncertainty

I’d like to point out Inherent Uncertainty, which I’ve added to the ML blog post scanner on the right. My understanding from Jake is that the intention is to have a multiauthor blog which is more specialized towards learning theory/game theory than this one. Nevertheless, several of the posts seem to be of wider interest.