Machine Learning (Theory)

9/29/20129/29/2012

Vowpal Wabbit, version 7.0

A new version of VW is out. The primary changes are:

Learning Reductions: I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct:
1. Multiclass Classification –oaa or –ect.
2. Cost Sensitive Multiclass Classification –csoaa or –wap.
3. Contextual Bandit Classification –cb.
4. Sequential Structured Prediction –searn or –dagger
In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning.
Library interface: VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over time.
Windows port: The priority of a windows port jumped way up once we moved to Microsoft. The only feature which we know doesn’t work at present is automatic backgrounding when in daemon mode.
New update rule: Stephane visited us this summer, and we fixed the default online update rule so that it is unit invariant.

There are also many other small updates including some contributed utilities that aid the process of applying and using VW.

Plans for the near future involve improving the quality of various items above, and of course better documentation: several of the reductions are not yet well documented.

8/27/20129/1/2012

The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to physicalscience@nyas.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year.

The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing.

Edit: Fixed second ICML deadline.

8/24/2012

Patterns for research in machine learning

There are a handful of basic code patterns that I wish I was more aware of when I started research in machine learning. Each on its own may seem pointless, but collectively they go a long way towards making the typical research workflow more efficient. Here they are:

Separate code from data.
Separate input data, working data and output data.
Save everything to disk frequently.
Separate options from parameters.
Do not use global variables.
Record the options used to generate each run of the algorithm.
Make it easy to sweep options.
Make it easy to execute only portions of the code.
Use checkpointing.
Write demos and tests.

Click here for discussion and examples for each item. Also see Charles Sutton’s and HackerNews’ thoughts on the same topic.

My guess is that these patterns will not only be useful for machine learning, but also any other computational work that involves either a) processing large amounts of data, or b) algorithms that take a significant amount of time to execute. Share this list with your students and colleagues. Trust me, they’ll appreciate it.

7/17/20127/17/2012

MUCMD and BayLearn

The workshop on the Meaningful Use of Complex Medical Data is happening again, August 9-12 in LA, near UAI on Catalina Island August 15-17. I enjoyed my visit last year, and expect this year to be interesting also.

The first Bay Area Machine Learning Symposium is August 30 at Google. Abstracts are due July 30.

7/9/2012

Videolectures

Yaser points out some nicely videotaped machine learning lectures at Caltech. Yaser taught me machine learning, and I always found the lectures clear and interesting, so I expect many people can benefit from watching. Relative to Andrew Ng‘s ML class there are somewhat different areas of emphasis but the topic is the same, so picking and choosing the union may be helpful.

Posts

Vowpal Wabbit, version 7.0

NYAS ML 2012 and ICML 2013

Patterns for research in machine learning

MUCMD and BayLearn

Videolectures