Machine Learning (Theory)


2011 Summer Conference Deadline Season

Tags: Announcements,Conferences jl@ 9:20 pm

Machine learning always welcomes the new year with paper deadlines for summer conferences. This year, we have:

Conference Paper Deadline When/Where Double blind? Author Feedback? Notes
ICML February 1 June 28-July 2, Bellevue, Washington, USA Y Y Weak colocation with ACL
COLT February 11 July 9-July 11, Budapest, Hungary N N colocated with FOCM
KDD February 11/18 August 21-24, San Diego, California, USA N N
UAI March 18 July 14-17, Barcelona, Spain Y N

The larger conferences are on the west coast in the United States, while the smaller ones are in Europe.


Herman Goldstine 2011

Vikas points out the Herman Goldstine Fellowship at IBM. I was a Herman Goldstine Fellow, and benefited from the experience a great deal—that’s where work on learning reductions started. If you can do research independently, it’s recommended. Applications are due January 6.


Vowpal Wabbit, version 5.0, and the second heresy

I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm.

The biggest changes are new algorithms:

  1. Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes.
  2. Nikos also implemented the per-feature learning rates as per these two papers. Often, this works better than the default algorithm. It isn’t the default because it isn’t (yet) as adaptable in terms of learning rate decay. This is enabled with –adaptive and learned regressors are compatible with the default. Computationally, you might see a factor of 4 slowdown if using ‘-q’. Nikos noticed that the phenomenal quake inverse square root hack applies making this substantially faster than a naive implementation.
  3. Nikos and Daniel also implemented active learning derived from this paper, usable via –active_simulation (to test parameters on an existing supervised dataset) or –active_learning (to do the real thing). This runs at full speed which is much faster than is reasonable in any active learning scenario. We see this approach dominating supervised learning on all classification datasets so far, often with far fewer labeled examples required, as the theory predicts. The learned predictor is compatible with the default.
  4. Olivier helped me implement preconditioned conjugate gradient based on Jonathan Shewchuk‘s tutorial. This is a batch algorithm and hence requires multiple passes over any dataset to do something useful. Each step of conjugate gradient requires 2 passes. The advantage of cg is that it converges relatively quickly via the use of second derivative information. This can be particularly helpful if your features are of widely differing scales. The use of –regularization 0.001 (or smaller) is almost required with –conjugate_gradient as it will otherwise overfit hard. This implementation has two advantages over the basic approach: it implicitly computes a Hessian in O(n) time where n is the number of features and it operates out of core, hence making it applicable to datasets that don’t conveniently fit in RAM. The learned predictor is compatible with the default, although you’ll notice that a factor of 8 more RAM is required when learning.
  5. Matt Hoffman and I implemented Online Latent Dirichlet Allocation. This code is still experimental and likely to change over the next week. It really does a minibatch update under the hood. The code appears to be substantially faster than Matt’s earlier python implementation making this probably the most efficient LDA anywhere. LDA is still much slower than online linear learning as it is quite computationally heavy in comparison—perhaps a good candidate for GPU optimization.
  6. Nikos, Daniel, and I have been experimenting with more online cluster parallel learning algorithms (–corrective, –backprop, –delayed_global). We aren’t yet satisfied with these although they are improving. Details are at the LCCC workshop.

In addition, Ariel added a test suite, Shravan helped with ngrams, and there are several other minor new features and bug fixes including a very subtle one caught by Vaclav.

The documentation on the website hasn’t kept up with the code. I’m planning to rectify that over the next week, and have a new tutorial starting at 2pm in the LCCC room for those interested. Yes, I’ll not be skiing :)


Traffic Prediction Problem

Slashdot points out the Traffic Prediction Challenge which looks pretty fun. The temporal aspect seems to be very common in many real-world problems and somewhat understudied.


Partha Niyogi has died

from brain cancer. I asked Misha who worked with him to write about it.

Partha Niyogi, Louis Block Professor in Computer Science and Statistics at the University of Chicago passed away on October 1, 2010, aged 43.

I first met Partha Niyogi almost exactly ten years ago when I was a graduate student in math and he had just started as a faculty in Computer Science and Statistics at the University of Chicago. Strangely, we first talked at length due to a somewhat convoluted mathematical argument in a paper on pattern recognition. I asked him some questions about the paper, and, even though the topic was new to him, he had put serious thought into it and we started regular meetings. We made significant progress and developed a line of research stemming initially just from trying to understand that one paper and to simplify one derivation. I think this was typical of Partha, showing both his intellectual curiosity and his intuition for the serendipitous; having a sense and focus for inquiries worth pursuing, no matter how remote or challenging, and bringing his unique vision to new areas. We had been working together continuously from that first meeting until he became too sick to continue. Partha had been a great adviser and a close friend for me; I am very much thankful to him for his guidance, intellectual inspiration and friendship.

Partha had a broad range of interests in research centered around the problem of learning, which had been his interest since he was an undergraduate at the Indian Institute of Technology. His research had three general themes: geometric methods in machine learning, particularly manifold methods; language evolution and language learning (he recently published a 500-page monograph on it) and speech analysis and recognition. I will not talk about his individual works, a more in-depth summary of his research is in the University of Chicago Computer Science department obituary. It is enough to say that his work has been quite influential and widely followed up. In every one of these areas he had his own approach, distinct, clear, and not afraid to challenge unexamined conventional wisdom. To lose this intellectually rigorous but open-minded vision is not just a blow to those of us who knew him and worked with him, but to the field of machine learning itself.

I owe a lot to Partha; to his insight and thoughtful attitude to research and every aspect of life. It had been a great privilege to be Partha’s student, collaborator and friend; his passing away leaves deep sadness and emptiness. It is hard to believe Partha is no longer with us, but his friendship and what I learned from him will stay with me for the rest of my life.

More from Jake, Suresh, and Lance.


Machined Learnings

Paul Mineiro has started Machined Learnings where he’s seriously attempting to do ML research in public. I personally need to read through in greater detail, as much of it is learning reduction related, trying to deal with the sorts of complex source problems that come up in practice.


New York Area Machine Learning Events

On Sept 21, there is another machine learning meetup where I’ll be speaking. Although the topic is contextual bandits, I think of it as “the future of machine learning”. In particular, it’s all about how to learn in an interactive environment, such as for ad display, trading, news recommendation, etc…

On Sept 24, abstracts for the New York Machine Learning Symposium are due. This is the largest Machine Learning event in the area, so it’s a great way to have a conversation with other people.

On Oct 22, the NY ML Symposium actually happens. This year, we are expanding the spotlights, and trying to have more time for posters. In addition, we have a strong set of invited speakers: David Blei, Sanjoy Dasgupta, Tommi Jaakkola, and Yann LeCun. After the meeting, a late hackNY related event is planned where students and startups can meet.

I’d also like to point out the related CS/Econ symposium as I have interests there as well.



Tags: Announcements,Conferences jl@ 5:35 pm

Geoff Gordon points out AIStats 2011 in Ft. Lauderdale, Florida. The call for papers is now out, due Nov. 1. The plan is to experiment with the review process to encourage quality in several ways. I expect to submit a paper and would encourage others with good research to do likewise.


Alex Smola starts a blog

Adventures in Data Land.


Rob Schapire at NYC ML Meetup

I’ve been wanting to attend the NYC ML Meetup for some time and hope to make it next week on the 25th. Rob Schapire is talking about “Playing Repeated Games”, which in my experience is far more relevant to machine learning than the title might indicate.


The Workshop on Cores, Clusters, and Clouds

Tags: Announcements,Workshop jl@ 8:47 am

Alekh, John, Ofer, and I are organizing a workshop at NIPS this year on learning in parallel and distributed environments. The general interest level in parallel learning seems to be growing rapidly, so I expect quite a bit of attendance. Please join us if you are parallel-interested.

And, if you are working in the area of parallel learning, please consider submitting an abstract due Oct. 17 for presentation at the workshop.



Tags: Announcements,Machine Learning jl@ 12:39 am

Joseph Turian creates MetaOptimize for discussion of NLP and ML on big datasets. This includes a blog, but perhaps more importantly a question and answer section. I’m hopeful it will take off.


Netflix Challenge 2 Canceled

Tags: Announcements,Competitions jl@ 6:33 pm

The second Netflix prize is canceled due to privacy problems. I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release.

I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.


Yahoo! ML events

Yahoo! is sponsoring two machine learning events that might interest people.

  1. The Key Scientific Challenges program (due March 5) for Machine Learning and Statistics offers $5K (plus bonuses) for graduate students working on a core problem of interest to Y! If you are already working on one of these problems, there is no reason not to submit, and if you aren’t you might want to think about it for next year, as I am confident they all press the boundary of the possible in Machine Learning. There are 7 days left.
  2. The Learning to Rank challenge (due May 31) offers an $8K first prize for the best ranking algorithm on a real (and really used) dataset for search ranking, with presentations at an ICML workshop. Unlike the Netflix competition, there are prizes for 2nd, 3rd, and 4th place, perhaps avoiding the heartbreak the ensemble encountered. If you think you know how to rank, you should give it a try, and we might all learn something. There are 3 months left.


Sam Roweis died

and I can’t help but remember him.

I first met Sam as an undergraduate at Caltech where he was TA for Hopfield‘s class, and again when I visited Gatsby, when he invited me to visit Toronto, and at too many conferences to recount. His personality was a combination of enthusiastic and thoughtful, with a great ability to phrase a problem so it’s solution must be understood. With respect to my own work, Sam was the one who advised me to make my first tutorial, leading to others, and to other things, all of which I’m grateful to him for. In fact, my every interaction with Sam was positive, and that was his way.

His death is being called a suicide which is so incompatible with my understanding of Sam that it strains my credibility. But we know that his many responsibilities were great, and it is well understood that basically all sane researchers have legions of inner doubts. Having been depressed now and then myself, it’s helpful to understand at least intellectually that the true darkness of the now is overestimated, and that you have more friends than you think. Sam was one of mine, and I’ll miss him.

My last interaction with Sam, last week, was discussing a new research direction that interested him, optimizing the cost of acquiring feature information in the learning algorithm. This problem is endemic to real-world applications, and has been studied to some extent elsewhere, but I expect that in our unwritten future history, we’ll discover that further study of this problem is more helpful than almost anyone realizes. The reply that I owed him feels heavy, and an incompleteness is hanging. For his wife and children it is surely so incomparably greater that I lack words.

(Added) Others: Fernando, Kevin McCurley, Danny Tarlow, David Hogg, Yisong Yue, Lance Fortnow on Sam, a Memorial site, and a Memorial Fund


Inherent Uncertainty

Tags: Announcements,Machine Learning jl@ 12:01 pm

I’d like to point out Inherent Uncertainty, which I’ve added to the ML blog post scanner on the right. My understanding from Jake is that the intention is to have a multiauthor blog which is more specialized towards learning theory/game theory than this one. Nevertheless, several of the posts seem to be of wider interest.


NIPS workshops

Many of the NIPS workshops have a deadline about now, and the NIPS early registration deadline is Nov. 6. Several interest me:

  1. Adaptive Sensing, Active Learning, and Experimental Design due 10/27.
  2. Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra, due Nov. 6.
  3. Large-Scale Machine Learning: Parallelism and Massive Datasets, due 10/23 (i.e. past)
  4. Analysis and Design of Algorithms for Interactive Machine Learning, due 10/30.

And I’m sure many of the others interest others. Workshops are great as a mechanism for research, so take a look if there is any chance you might be interested.


New York Area Machine Learning Events

Several events are happening in the NY area.

  1. Barriers in Computational Learning Theory Workshop, Aug 28. That’s tomorrow near Princeton. I’m looking forward to speaking at this one on “Getting around Barriers in Learning Theory”, but several other talks are of interest, particularly to the CS theory inclined.
  2. Claudia Perlich is running the INFORMS Data Mining Contest with a deadline of Sept. 25. This is a contest using real health record data (they partnered with HealthCare Intelligence) to predict transfers and mortality. In the current US health care reform debate, the case studies of high costs we hear strongly suggest machine learning & statistics can save many billions.
  3. The Singularity Summit October 3&4. This is for the AIists out there. Several of the talks look interesting, although unfortunately I’ll miss it for ALT.
  4. Predictive Analytics World, Oct 20-21. This is stretching the definition of “New York Area” a bit, but the train to DC is reasonable. This is a conference of case studies of applications of ML to real-world problems.
  5. Machine Learning Symposium, Friday Nov. 6. I’m on the committee again this year. The abstract deadline is Sept. 30, and we already have several speakers lined up.


The Machine Learning Forum

Dear Fellow Machine Learners,

For the past year or so I have become increasingly frustrated with the peer review system in our field. I constantly get asked to review papers in which I have no interest. At the same time, as an action editor in JMLR, I constantly have to harass people to review papers. When I send papers to conferences and to journals I often get rejected with reviews that, at least in my mind, make no sense. Finally, I have a very hard time keeping up with the best new work, because I don’t know where to look for it…

I decided to try an do something to improve the situation. I started a new web site, which I decided to call “The machine learning forum” the URL is

The main idea behind this web site is to remove anonymity from the review process. In this site, all opinions are attributed to the actual person that expressed them. I expect that this will improve the quality of the reviews. An obvious other effect is that there will be fewer negative reviews, weak papers will tend not to get reviewed at all, but then again, is that such a bad thing?

If you have any interest in this endeavor, please register to the web site and please submit a photo of yourself. Based on the information on your web site I will decide whether to grant you “author” privileges that would allow you to write reviews and overviews. Anybody can submit pointers to publications that they would like somebody to review. Anybody can participate in the discussion forum that is a fancy message board with threads etc.

Right now the main contribution I am looking for are “overviews”.

Overviews are pages written by somebody who is an authority in some area (for example, Kamalika Chaudhuri is an authority on mixture models) in which they list the main papers in the area and five a high level description for how the papers relate. These overviews are intended to serve as an entry point for somebody that wants to learn about that subfield. Overviews *can* reference the work of the author of the overview. This is unlike reviews, in which the reviewer cannot be the author of the reviewed paper.

I hope you are interested enough to give this a try!

Comments are very welcome.


Yoav Freund (


Many ways to Learn this summer

There are at least 3 summer schools related to machine learning this summer.

  1. The first is at University of Chicago June 1-11 organized by Misha Belkin, Partha Niyogi, and Steve Smale. Registration is closed for this one, meaning they met their capacity limit. The format is essentially an extended Tutorial/Workshop. I was particularly interested to see Valiant amongst the speakers. I’m also presenting Saturday June 6, on logarithmic time prediction.
  2. Praveen Srinivasan points out the second at Peking University in Beijing, China, July 20-27. This one differs substantially, as it is about vision, machine learning, and their intersection. The deadline for applications is June 10 or 15. This is also another example of the growth of research in China, with active support from NSF.
  3. The third one is at Cambridge, England, August 29-September 10. It’s in the MLSS series. Compared to the Chicago one, this one is more about the Bayesian side of ML, although effort has been made to create a good cross section of topics. It’s also more focused on tutorials over workshop-style talks.
« Newer PostsOlder Posts »

Powered by WordPress