Sam Roweis died

and I can’t help but remember him.

I first met Sam as an undergraduate at Caltech where he was TA for Hopfield‘s class, and again when I visited Gatsby, when he invited me to visit Toronto, and at too many conferences to recount. His personality was a combination of enthusiastic and thoughtful, with a great ability to phrase a problem so it’s solution must be understood. With respect to my own work, Sam was the one who advised me to make my first tutorial, leading to others, and to other things, all of which I’m grateful to him for. In fact, my every interaction with Sam was positive, and that was his way.

His death is being called a suicide which is so incompatible with my understanding of Sam that it strains my credibility. But we know that his many responsibilities were great, and it is well understood that basically all sane researchers have legions of inner doubts. Having been depressed now and then myself, it’s helpful to understand at least intellectually that the true darkness of the now is overestimated, and that you have more friends than you think. Sam was one of mine, and I’ll miss him.

My last interaction with Sam, last week, was discussing a new research direction that interested him, optimizing the cost of acquiring feature information in the learning algorithm. This problem is endemic to real-world applications, and has been studied to some extent elsewhere, but I expect that in our unwritten future history, we’ll discover that further study of this problem is more helpful than almost anyone realizes. The reply that I owed him feels heavy, and an incompleteness is hanging. For his wife and children it is surely so incomparably greater that I lack words.

(Added) Others: Fernando, Kevin McCurley, Danny Tarlow, David Hogg, Yisong Yue, Lance Fortnow on Sam, a Memorial site, and a Memorial Fund

Edit: removed a news article link by request

Interesting things at NIPS 2009

Several papers at NIPS caught my attention.

  1. Elad Hazan and Satyen Kale, Online Submodular Optimization They define an algorithm for online optimization of submodular functions with regret guarantees. This places submodular optimization roughly on par with online convex optimization as tractable settings for online learning.
  2. Elad Hazan and Satyen Kale On Stochastic and Worst-Case Models of Investing. At it’s core, this is yet another example of modifying worst-case online learning to deal with variance, but the application to financial models is particularly cool and it seems plausibly superior other common approaches for financial modeling.
  3. Mark Palatucci, Dean Pomerlau, Tom Mitchell, and Geoff Hinton Zero Shot Learning with Semantic Output Codes The goal here is predicting a label in a multiclass supervised setting where the label never occurs in the training data. They have some basic analysis and also a nice application to FMRI brain reading.
  4. Shobha Venkataraman, Avrim Blum, Dawn Song, Subhabrata Sen, and Oliver Spatscheck, Tracking Dynamic Sources of Malicious Activity at Internet Scales. This is a plausible combination of worst-case learning algorithms in a tree-like structure over IP space to track and predict bad IPs. Their empirical results look quite good to me and there are many applications where this prediction problem needs to be solved.
  5. Kamalika Chaudhuri, Daniel Hsu, and Yoav Freund, A Parameter Free Hedging Algorithm This paper is about eliminating the learning rate parameter from online learning algorithms. While that’s certainly useful, the approach taken involves a double-exponential rather than a single exponential potential, which is strange and potentially useful in many other places.
  6. Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Polynomial Semantic Indexing This is about an empirically improved algorithm for learning ranking functions based on (query,document) content. The sexy Semantic name is justified because it is not based on syntactic matching of query to document.

I also found the future publication models discussion interesting. The follow-up post here has details and further discussion.

At the workshops, I was deeply confronted with the problem of too many interesting workshops to attend in the given amount of time. Two talks stood out for me:

  1. Carlos Guestrin gave a talk in the interactive machine learning workshop on Turning Down the Noise in the Blogosphere by Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin which I missed at KDD this year. The paper discusses the use exponential weight online learning algorithms to rerank blog posts based on user-specific interests. It comes with a demonstration website where you can test it out.
  2. Leslie Valiant gave a talk on representations and operations on concepts in a brain-like fashion. The style of representation and algorithm involves distributed representations on sparse graphs, an approach which is relatively unfamiliar. Bloom filters and in machine learning experience with learning through hashing functions has sharpened my intuition a bit. The talk seemed to cover Memorization and Association on a Realistic Neural Model at Neural Computation as well as A First Experimental Demonstration of Massive Knowledge Infusion at KR.

Top graduates this season

I would like to point out 3 graduates this season as having my confidence they are capable of doing great things.

  1. Daniel Hsu has diverse papers with diverse coauthors on {active learning, mulitlabeling, temporal learning, …} each covering new algorithms and methods of analysis. He is also a capable programmer, having helped me with some nitty-gritty details of cluster parallel Vowpal Wabbit this summer. He has an excellent tendency to just get things done.
  2. Nicolas Lambert doesn’t nominally work in machine learning, but I’ve found his work in elicitation relevant nevertheless. In essence, elicitable properties are closely related to learnable properties, and the elicitation complexity is related to a notion of learning complexity. See the Surrogate regret bounds paper for some related discussion. Few people successfully work at such a general level that it crosses fields, but he’s one of them.
  3. Yisong Yue is deeply focused on interactive learning, which he has attacked at all levels: theory, algorithm adaptation, programming, and popular description. I’ve seen a relentless multidimensional focus on a new real-world problem be an excellent strategy for research and expect he’ll succeed.

The obvious caveat applies—I don’t know or haven’t fully appreciated everyone’s work so I’m sure I missed people. I’d like to particularly point out Percy Liang and David Sontag as plausibly such whom I’m sure others appreciate a great deal.

Inherent Uncertainty

I’d like to point out Inherent Uncertainty, which I’ve added to the ML blog post scanner on the right. My understanding from Jake is that the intention is to have a multiauthor blog which is more specialized towards learning theory/game theory than this one. Nevertheless, several of the posts seem to be of wider interest.

Future Publication Models @ NIPS

Yesterday, there was a discussion about future publication models at NIPS. Yann and Zoubin have specific detailed proposals which I’ll add links to when I get them (Yann’s proposal and Zoubin’s proposal).

What struck me about the discussion is that there are many simultaneous concerns as well as many simultaneous proposals, which makes it difficult to keep all the distinctions straight in a verbal conversation. It also seemed like people were serious enough about this that we may see some real movement. Certainly, my personal experience motivates that as I’ve posted many times about the substantial flaws in our review process, including some very poor personal experiences.

Concerns include the following:

  1. (Several) Reviewers are overloaded, boosting the noise in decision making.
  2. (Yann) A new system should run with as little built-in delay and friction to the process of research as possible.
  3. (Hanna Wallach(updated)) Double-blind review is particularly important for people who are unknown or from an unknown institution.
  4. (Several) But, it’s bad to take double blind so seriously as to disallow publishing on arxiv or personal webpages.
  5. (Yann) And double-blind is bad when it prevents publishing for substantial periods of time. Apparently, this comes up in CVPR.
  6. (Zoubin) Any new system should appear to outsiders as if it’s the old system, or a journal, because it’s already hard enough to justify CS tenure cases to other disciplines.
  7. (Fernando) There shouldn’t be a big change with a complex bureaucracy, but rather a smaller changes which are obviously useful or at least worth experimenting with.

There were other concerns as well, but these are the ones that I remember.

Elements of proposals include:

  1. (Yann) Everything should go to Arxiv or an arxiv-like system first, as per physics or mathematics. This addresses (1), because it delinks dissemination from review, relieving some of the burden of reviewing. It also addresses (2) since with good authors they can immediately begin building on each other’s work. It conflicts with (3), because Arxiv does not support double-blind submission. It does not conflict if we build our own system.
  2. (Fernando) Create a conference coincident journal in which people can publish at any time. VLDB has apparently done this. It can be done smoothly by allowing submission in either conference deadline mode or journal mode. This proposal addresses (1) by reducing peak demand on reviewing. It also addresses (6) above.
  3. (Daphne) Perhaps we should have a system which only reviews papers for correctness, which is not nearly as subjective as for novelty or interestingness. This addresses (1), by eliminating some concerns for the reviewer. It is orthogonal to the double blind debate. In biology, such a journal exists (pointer updated), because delays were becoming absurd and intolerable.
  4. (Yann) There should be multiple publishing entities (people or groups of people) that can bless a paper as interesting. This addresses (1).

There are many other proposal elements (too many for my memory), which hopefully we’ll see in particular proposals. If other people have concrete proposals, now is probably the right time to formalize them.