October 2010 – Machine Learning (Theory)

10/29/201011/5/2010

To Vidoelecture or not

For the first time in several years, ICML 2010 did not have videolectures attending. Luckily, the tutorial on exploration and learning which Alina and I put together can be viewed, since we also presented at KDD 2010, which included videolecture support.

ICML didn’t cover the cost of a videolecture, because PASCAL didn’t provide a grant for it this year. On the other hand, KDD covered it out of registration costs. The cost of videolectures isn’t cheap. For a workshop the baseline quote we have is 270 euro per hour, plus a similar cost for the cameraman’s travel and accomodation. This can be reduced substantially by having a volunteer with a camera handle the cameraman duties, uploading the video and slides to be processed for a quoted 216 euro per hour.

Youtube is the most predominant free video site with a cost of $0, but it turns out to be a poor alternative. 15 minute upload limits do not match typical talk lengths. Videolectures also have side-by-side synchronized slides & video which allows quick navigation of the videostream and acceptable resolution of typical talk slides. Overall, these benefits are substantial enough that youtube is not presently a serious alternative.

So, if we can’t avoid paying the cost, is it worthwhile? One way to judge this is by comparing how much authors currently spend traveling to a conference and presenting research vs. the size of the audience. In general, costs vary wildly, but for a typical academic international conference, airfare, hotel, and registration are commonly at least $1000 even after scrimping. The sizes of audiences also varies substantially, but something in the 30-100 range is a typical average. For KDD 2010, the average number of views per presentation is 14.6, but this is misleadingly low, as KDD presentations were just put up. A better number is for KDD 2009, where the average view number is presently 74.2. This number is representative with ICML 2009 presently averaging 115.8. We can argue about the relative merits of online vs. in-person viewing, but the order of their value is at least unclear, since in an online system people specifically seek out lectures to view while at the conference itself people are often opportunistic viewers. Valuing these equally, we see that videolectures increases the size of the audience, and (hence) the value to authors by perhaps a factor of 2 for a cost around 1/3 of current presentation costs.

This conclusion is conservative, because a videolecture is almost surely viewed over more than a year, cost of conference attendance are often higher, and the cost in terms of a presenter’s time is not accounted for. Overall, videolecture coverage seems quite worthwhile. Since authors also typically are the attendees of a conference, increasing the registration fees to cover the cost of videolectures seems reasonable. A videolecture is simply a new publishing format.

We can hope that the price will drop over time, as it’s not clear to me that the 216 euros/hour reflects the real costs of videolectures.net. Some competition of a similar quality would be the surest way to do that. But in the near future, there are two categories of conferences—those that judge the value of their content above 216 euros/hour, and those that do not. Whether or not a conference has videolecture support substantially impacts its desirability as a place to send papers.

10/28/201010/28/2010

NY ML Symposium 2010

About 200 people attended the 2010 NYAS ML Symposium this year. (It was about 170 last year.) I particularly enjoyed several talks.

Yann has a new live demo of (limited) real-time object recognition learning.
Sanjoy gave a fairly convincing and comprehensible explanation of why a modified form of single-linkage clustering is consistent in higher dimensions, and why consistency is a critical feature for clustering algorithms. I’m curious how well this algorithm works in practice.
Matt Hoffman‘s poster covering online LDA seemed pretty convincing to me as an algorithmic improvement.

This year, we allocated more time towards posters & poster spotlights.

For next year, we are considering some further changes. The format has traditionally been 4 invited Professor speakers, with posters and poster spotlight for students. Demand from other parties to participate is growing, for example from postdocs and startups in the area. Another growing concern is the facility—the location is exceptional, but fitting more people is challenging. Does anyone else have suggestions to improve the event?

New York is a particularly good location for a regional symposium, but it’s quite plausible that other places could have one as well. Looking at Meetup groups for interest, obvious choices are Southern California (San Diego & Los Angeles both have large R meetup groups), Northern California (which has , 2, 3 AI-related Meetup groups), and Sydney, Australia, which has a large datamining meetup group. Relative to meetups, a regional symposium offers a more intense affair with higher participation, and new kinds of participation (for example, via posters). Relative to international conferences, a regional meetup is much easier, and has a high chance of producing future collaborations.

10/17/2010

Partha Niyogi has died

from brain cancer. I asked Misha who worked with him to write about it.

Partha Niyogi, Louis Block Professor in Computer Science and Statistics at the University of Chicago passed away on October 1, 2010, aged 43.

I first met Partha Niyogi almost exactly ten years ago when I was a graduate student in math and he had just started as a faculty in Computer Science and Statistics at the University of Chicago. Strangely, we first talked at length due to a somewhat convoluted mathematical argument in a paper on pattern recognition. I asked him some questions about the paper, and, even though the topic was new to him, he had put serious thought into it and we started regular meetings. We made significant progress and developed a line of research stemming initially just from trying to understand that one paper and to simplify one derivation. I think this was typical of Partha, showing both his intellectual curiosity and his intuition for the serendipitous; having a sense and focus for inquiries worth pursuing, no matter how remote or challenging, and bringing his unique vision to new areas. We had been working together continuously from that first meeting until he became too sick to continue. Partha had been a great adviser and a close friend for me; I am very much thankful to him for his guidance, intellectual inspiration and friendship.

Partha had a broad range of interests in research centered around the problem of learning, which had been his interest since he was an undergraduate at the Indian Institute of Technology. His research had three general themes: geometric methods in machine learning, particularly manifold methods; language evolution and language learning (he recently published a 500-page monograph on it) and speech analysis and recognition. I will not talk about his individual works, a more in-depth summary of his research is in the University of Chicago Computer Science department obituary. It is enough to say that his work has been quite influential and widely followed up. In every one of these areas he had his own approach, distinct, clear, and not afraid to challenge unexamined conventional wisdom. To lose this intellectually rigorous but open-minded vision is not just a blow to those of us who knew him and worked with him, but to the field of machine learning itself.

I owe a lot to Partha; to his insight and thoughtful attitude to research and every aspect of life. It had been a great privilege to be Partha’s student, collaborator and friend; his passing away leaves deep sadness and emptiness. It is hard to believe Partha is no longer with us, but his friendship and what I learned from him will stay with me for the rest of my life.

An easy proof of the Chernoff-Hoeffding bound

Textbooks invariably seem to carry the proof that uses Markov’s inequality, moment-generating functions, and Taylor approximations. Here’s an easier way.

For $latex p,q \in (0,1)$, let $latex K(p,q)$ be the KL divergence between a coin of bias $latex p$ and one of bias $latex q$: $latex K(p,q) = p \ln \frac{p}{q} + (1-p) \ln \frac{1-p}{1-q}.$

Theorem: Suppose you do $latex n$ independent tosses of a coin of bias $latex p$. The probability of seeing $latex qn$ heads or more, for $latex q > p$, is at most $latex \exp(-nK(q,p))$. So is the probability of seeing $latex qn$ heads or less, for $latex q < p$.

Remark: By Pinsker’s inequality, $latex K(q,p) \geq 2(p-q)^2$.

Proof Let’s do the $latex q > p$ case; the other is identical.

Let $latex \theta_p$ be the distribution over $latex \{0,1\}^n$ induced by a coin of bias $latex p$, and likewise $latex \theta_q$ for a coin of bias $latex q$. Let $latex S$ be the set of all sequences of $latex n$ tosses which contain $latex qn$ heads or more. We’d like to show that $latex S$ is unlikely under $latex \theta_p$.

Pick any $latex \bar{x} \in S$, with say $latex k \geq qn$ heads. Then:
[latex size=”2″] \frac{\theta_q(\bar{x})}{\theta_p(\bar{x})} = \frac{q^k(1-q)^{n-k}}{p^k(1-p)^{n-k}} \geq \frac{q^{qn}(1-q)^{n-qn}}{p^{qn}(1-p)^{n-qn}} = \left( \frac{q}{p} \right)^{qn} \left( \frac{1-q}{1-p}\right)^{(1-q)n} = e^{n K(q,p)}.[/latex]

Since $latex \theta_p(\bar{x}) \leq \exp(-nK(q,p)) \theta_q(\bar{x})$ for every $latex \bar{x} \in S$, we have [latex]\theta_p(S) \leq \exp(-nK(q,p)) \theta_q(S) \leq \exp(-nK(q,p))[/latex] and we’re done.