At NIPS I’m giving a tutorial on Learning to Interact. In essence this is about dealing with causality in a contextual bandit framework. Relative to previous tutorials, I’ll be covering several new results that changed my understanding of the nature of the problem. Note that Judea Pearl and Elias Bareinboim have a tutorial on causality. This might appear similar, but is quite different in practice. Pearl and Bareinboim’s tutorial will be about the general concepts while mine will be about total mastery of the simplest nontrivial case, including code. Luckily, they have the right order. I recommend going to both
I also just released version 7.4 of Vowpal Wabbit. When I was a frustrated learning theorist, I did not understand why people were not using learning reductions to solve problems. I’ve been slowly discovering why with VW, and addressing the issues. One of the issues is that machine learning itself was not automatic enough, while another is that creating a very low overhead process for doing learning reductions is vitally important. These have been addressed well enough that we are starting to see compelling results. Various changes:
- The internal learning reduction interface has been substantially improved. It’s now pretty easy to write new learning reduction. binary.cc provides a good example. This is a very simple reduction which just binarizes the prediction. More improvements are coming, but this is good enough that other people have started contributing reductions.
- Zhen Qin had a very productive internship with Vaclav Petricek at eharmony resulting in several systemic modifications and some new reductions, including:
- A direct hash inversion implementation for use in debugging.
- A holdout system which takes over for progressive validation when multiple passes over data are used. This keeps the printouts ‘honest’.
- An online bootstrap mechanism system which efficiently provides some understanding of prediction variations and which can sometimes effectively trade computational time for increased accuracy via ensembling. This will be discussed at the biglearn workshop at NIPS.
- A top-k reduction which chooses the top-k of any set of base instances.
- Hal Daume has a new implementation of Searn (and Dagger, the codes are unified) which makes structured prediction solutions far more natural. He has optimized this quite thoroughly (exercising the reduction stack in the process), resulting in this pretty graph.
Here, CRF++ is commonly used conditional random field code, SVMstruct is an SVM-style approach to classification, and CRF SGD is an online learning CRF approach. All of these methods use the same features. Fully optimized code is typically rough, but this one is less than 100 lines.
I’m trying to put together a tutorial on these things at NIPS during the workshop break on the 9th and will add details as that resolves for those interested enough to skip out on skiing
Edit: The VW tutorial will take place during the break at the big learning workshop from 1:30pm – 3pm at Harveys Emerald Bay B.
I was not as personally close to Ben as Sam, but the level of tragedy is similar and I can’t help but be greatly saddened by the loss.
Various news stories have coverage, but the synopsis is that he had a heart attack on Sunday and is survived by his wife Anat and daughter Aviv. There is discussion of creating a memorial fund for them, which I hope comes to fruition, and plan to contribute to.
I will remember Ben as someone who thought carefully and comprehensively about new ways to do things, then fought hard and successfully for what he believed in. It is an ideal we strive for, that Ben accomplished.
Edit: donations go here, and more information is here.
Several strong graduates are on the job market this year.
- Alekh Agarwal made the most scalable public learning algorithm as an intern two years ago. He has a deep and broad understanding of optimization and learning as well as the ability and will to make things happen programming-wise. I’ve been privileged to have Alekh visiting me in NY where he will be sorely missed.
- John Duchi created Adagrad which is a commonly helpful improvement over online gradient descent that is seeing wide adoption, including in Vowpal Wabbit. He has a similarly deep and broad understanding of optimization and learning with significant industry experience at Google. Alekh and John have often coauthored together.
- Stephane Ross visited me a year ago over the summer, implementing many new algorithms and working out the first scale free online update rule which is now the default in Vowpal Wabbit. Stephane is not on the market—Google robbed the cradle successfully I’m sure that he will do great things.
- Anna Choromanska visited me this summer, where we worked on extreme multiclass classification. She is very good at focusing on a problem and grinding it into submission both in theory and in practice—I can see why she wins awards for her work. Anna’s future in research is quite promising.
I also wanted to mention some postdoc openings in machine learning.
There will be no New York ML Symposium this year. The core issue is that NYAS is disorganized by people leaving, pushing back the date, with the current candidate a spring symposium on March 28. Gunnar and I were outvoted here—we were gung ho on organizing a fall symposium, but the rest of the committee wants to wait.
In some good news, most of the ICML 2012 videos have been restored from a deep backup.
A big ouch—all the videos for ICML 2012 were lost in a shuffle. Rajnish sends the below, but if anyone can help that would be greatly appreciated.
Sincere apologies to ICML community for loosing 2012 archived videos
What happened: In order to publish 2013 videos, we decided to move 2012 videos to another server. We have a weekly backup service from the provider but after removing the videos from the current server, when we tried to retrieve the 2012 videos from backup service, the backup did not work because of provider-specific requirements that we had ignored while removing the data from previous server.
What are we doing about this: At this point, we are still looking into raw footage to find if we can retrieve some of the videos, but following are the steps we are taking to make sure this does not happen again in future:
(1) We are going to create a channel on Vimeo (and potentially on YouTube) and we will publish there the p-in-p- or slide-versions of the videos. This will be available by the beginning of Oct 2013.
(2) We are going to provide download links from TechTalks so that the slide-version (of p-in-p- version if availbale) of the videos can be directly downloaded by viewers.This feature will be available by Aug 4th 2013.
(3) Of course we are now creating regular backups that do not depend on our service provider.
How can you help: If you have downloaded from TechTalks the ICML 2012 videos using external tools, we will really appreciate if you can provide us the videos, please email at email@example.com .
The large scale machine learning class I taught with Yann LeCun has finished. As I expected, it took quite a bit of time . We had about 25 people attending in person on average and 400 regularly watching the recorded lectures which is substantially more sustained interest than I expected for an advanced ML class. We also had some fun with class projects—I’m hopeful that several will eventually turn into papers.
I expect there are a number of professors interested in lecturing on this and related topics. Everyone will have their personal taste in subjects of course, but hopefully there will be some convergence to common course materials as well. To help with this, I am making the sources to my presentations available. Feel free to use/improve/embelish/ridicule/etc… in the pursuit of the perfect course.
Sebastien Bubeck points out COLT registration with a May 13 early registration deadline. The local organizers have done an admirable job of containing costs with a $300 registration fee.
ICML registration is also available, at about an x3 higher cost. My understanding is that this is partly due to the costs of a larger conference being harder to contain, partly due to ICML lasting twice as long with tutorials and workshops, and partly because the conference organizers were a bit over-conservative in various ways.
Adam Kalai points out the New England Machine Learning Day May 1 at MSR New England. There is a poster session with abstracts due April 19. I understand last year’s NEML went well and it’s great to meet your neighbors at regional workshops like this.
Sebastien Bubeck has a new ML blog focused on optimization and partial feedback which may interest people.
Yann LeCun and I are coteaching a class on Large Scale Machine Learning starting late January at NYU. This class will cover many tricks to get machine learning working well on datasets with many features, examples, and classes, along with several elements of deep learning and support systems enabling the previous.
This is not a beginning class—you really need to have taken a basic machine learning class previously to follow along. Students will be able to run and experiment with large scale learning algorithms since Yahoo! has donated servers which are being configured into a small scale Hadoop cluster. We are planning to cover the frontier of research in scalable learning algorithms, so good class projects could easily lead to papers.
For me, this is a chance to teach on many topics of past research. In general, it seems like researchers should engage in at least occasional teaching of research, both as a proof of teachability and to see their own research through that lens. More generally, I expect there is quite a bit of interest: figuring out how to use data to make predictions well is a topic of growing interest to many fields. In 2007, this was true, and demand is much stronger now. Yann and I also come from quite different viewpoints, so I’m looking forward to learning from him as well.
We plan to videotape lectures and put them (as well as slides) online, but this is not a MOOC in the sense of online grading and class certificates. I’d prefer that it was, but there are two obstacles: NYU is still figuring out what to do as a University here, and this is not a class that has ever been taught before. Turning previous tutorials and class fragments into coherent subject matter for the 50 students we can support at NYU will be pretty challenging as is. My preference, however, is to enable external participation where it’s easily possible.
Suggestions or thoughts on the class are welcome
Michael Jordan sends the below:
The new Simons Institute for the Theory of Computing
will begin organizing semester-long programs starting in 2013.
One of our first programs, set for Fall 2013, will be on the “Theoretical Foundations
of Big Data Analysis”. The organizers of this program are Michael Jordan (chair),
Stephen Boyd, Peter Buehlmann, Ravi Kannan, Michael Mahoney, and Muthu Muthukrishnan.
See http://simons.berkeley.edu/program_bigdata2013.html for more information on
The Simons Institute has created a number of “Research Fellowships” for young
researchers (within at most six years of the award of their PhD) who wish to
participate in Institute programs, including the Big Data program. Individuals
who already hold postdoctoral positions or who are junior faculty are welcome
to apply, as are finishing PhDs.
Please note that the application deadline is January 15, 2013. Further details
are available at http://simons.berkeley.edu/fellows.html .
A reminder that the New York Academy of Sciences will be hosting the 7th Annual Machine Learning Symposium tomorrow from 9:30am.
The main program will feature invited talks from Peter Bartlett, William Freeman, and Vladimir Vapnik, along with numerous spotlight talks and a poster session. Following the main program, hackNY and Microsoft Research are sponsoring a networking hour with talks from machine learning practitioners at NYC startups (specifically bit.ly, Buzzfeed, Chartbeat, and Sense Networks, Visual Revenue). This should be of great interest to everyone considering working in machine learning.
The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to firstname.lastname@example.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year.
The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing.
Edit: Fixed second ICML deadline.
The workshop on the Meaningful Use of Complex Medical Data is happening again, August 9-12 in LA, near UAI on Catalina Island August 15-17. I enjoyed my visit last year, and expect this year to be interesting also.
The first Bay Area Machine Learning Symposium is August 30 at Google. Abstracts are due July 30.
Yaser points out some nicely videotaped machine learning lectures at Caltech. Yaser taught me machine learning, and I always found the lectures clear and interesting, so I expect many people can benefit from watching. Relative to Andrew Ng‘s ML class there are somewhat different areas of emphasis but the topic is the same, so picking and choosing the union may be helpful.
Larry Wasserman has started the Normal Deviate blog which I added to the blogroll on the right.
Manfred Warmuth points out the UCSC machine learning summer school running July 9-20 which may be of particular interest to those in silicon valley.
The accepted papers are up in full detail. We are still struggling with the precise program itself, but that’s coming along. Also note the May 13 deadline for early registration and room booking.
May 16 in Cambridge, is the New England Machine Learning Day, a first regional workshop/symposium on machine learning. To present a poster, submit an abstract by May 5.
May 19 in New York, STOC is coming to town and rather surprisingly having workshops which should be quite a bit of fun. I’ll be speaking at Algorithms for Distributed and Streaming Data.
has died. He lived a full life. I know him personally as a founder of the Center for Computational Learning Systems and the New York Machine Learning Symposium, both of which have sheltered and promoted the advancement of machine learning. I expect much of the New York area machine learning community will miss him, as well as many others around the world.