Simons Institute Big Data Program

Michael Jordan sends the below:

The new Simons Institute for the Theory of Computing
will begin organizing semester-long programs starting in 2013.

One of our first programs, set for Fall 2013, will be on the “Theoretical Foundations
of Big Data Analysis”. The organizers of this program are Michael Jordan (chair),
Stephen Boyd, Peter Buehlmann, Ravi Kannan, Michael Mahoney, and Muthu Muthukrishnan.

See http://simons.berkeley.edu/program_bigdata2013.html for more information on
the program.

The Simons Institute has created a number of “Research Fellowships” for young
researchers (within at most six years of the award of their PhD) who wish to
participate in Institute programs, including the Big Data program. Individuals
who already hold postdoctoral positions or who are junior faculty are welcome
to apply, as are finishing PhDs.

Please note that the application deadline is January 15, 2013. Further details
are available at http://simons.berkeley.edu/fellows.html .

Mike Jordan

ML Symposium and Strata/Hadoop World

The New York ML symposium was last Friday. There were 303 registrations, up a bit from last year. I particularly enjoyed talks by Bill Freeman on vision and ML, Jon Lenchner on strategy in Jeopardy, and Tara N. Sainath and Brian Kingsbury on deep learning for speech recognition. If anyone has suggestions or thoughts for next year, please speak up.

I also attended Strata + Hadoop World for the first time. This is primarily a trade conference rather than an academic conference, but I found it pretty interesting as a first time attendee. This is ground zero for the Big data buzzword, and I see now why. It’s about data, and the word “big” is so ambiguous that everyone can lay claim to it. There were essentially zero academic talks. Instead, the focus was on war stories, product announcements, and education. The general level of education is much lower—explaining Machine Learning to the SQL educated is the primary operating point. Nevertheless that’s happening, and the fact that machine learning is considered a necessary technology for industry is a giant step for the field. Over time, I expect the industrial side of Machine Learning to grow, and perhaps surpass the academic side, in the same sense as has already occurred for chip design. Amongst the talks I could catch, I particularly liked the Github, Zillow, and Pandas talks. Ted Dunning also gave a particularly masterful talk, although I have doubts about the core Bayesian Bandit approach(*). The streaming k-means algorithm they implemented does look quite handy.

(*) The doubt is the following: prior elicitation is generally hard, and Bayesian techniques are not robust to misspecification. This matters in standard supervised settings, but it may matter more in exploration settings where misspecification can imply data starvation.

Vowpal Wabbit, version 7.0

A new version of VW is out. The primary changes are:

  1. Learning Reductions: I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct:
    1. Multiclass Classification –oaa or –ect.
    2. Cost Sensitive Multiclass Classification –csoaa or –wap.
    3. Contextual Bandit Classification –cb.
    4. Sequential Structured Prediction –searn or –dagger

    In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning.

  2. Library interface: VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over time.
  3. Windows port: The priority of a windows port jumped way up once we moved to Microsoft. The only feature which we know doesn’t work at present is automatic backgrounding when in daemon mode.
  4. New update rule: Stephane visited us this summer, and we fixed the default online update rule so that it is unit invariant.

There are also many other small updates including some contributed utilities that aid the process of applying and using VW.

Plans for the near future involve improving the quality of various items above, and of course better documentation: several of the reductions are not yet well documented.

NYAS ML 2012 and ICML 2013

The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to physicalscience@nyas.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year.

The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing.

Edit: Fixed second ICML deadline.