The Real World Interactive Learning Tutorial

Alekh and I have been polishin the Real World Interactive Learning tutorial for ICML 2017 on Sunday.

This tutorial should be of pretty wide interest. For data scientists, we are crossing a threshold into easy use of interactive learning while for researchers interactive learning is plausibly the most important frontier of understanding. Great progress on both the theory and especially on practical systems has been made since an earlier NIPS 2013 tutorial.

Please join us if you are interested 🙂

Machine Learning the Future Class

This spring, I taught a class on Machine Learning the Future at Cornell Tech covering a number of advanced topics in machine learning including online learning, joint (structured) prediction, active learning, contextual bandit learning, logarithmic time prediction, and parallel learning. Each of these classes was recorded from the laptop via Zoom and I just uploaded the recordings to Youtube.

In some ways, this class is a followup to the large scale learning class I taught with Yann LeCun 4 years ago. The videos for that class were taken down(*) so these lectures both update and replace shared subjects as well as having some new subjects.

Much of this material is fairly close to research so to assist other machine learning lecturers around the world in digesting the material, I’ve made all the source available as well. Feel free to use and improve.

(*) The NYU policy changed so that students could not be shown in classroom videos.

Vowpal Wabbit 7.8 at NIPS

I just created Vowpal Wabbit 7.8, and we are planning to have an increasingly less heretical followup tutorial during the non-“ski break” at the NIPS Optimization workshop. Please join us if interested.

I always feel like things are going slow, but in the last year, but there have been many changes overall. Notes for 7.7 are here. Since then, there are several areas of improvement as well as generalized bug fixes and refactoring.

  1. Learning to Search: Hal completely rewrote the learning to search system, enough that the numbers here are looking obsolete. Kai-Wei has also created several advanced applications for entity-relation and dependency parsing which are promising.
  2. Languages Hal also created a good python library, which includes call-backs for learning to search. You can now develop advanced structured prediction solutions in a nice language. Jonathan Morra also contributed an initial Java interface.
  3. Exploration The contextual bandit subsystem now allows evaluation of an arbitrary policy, and an exploration library is now factored out into an independent library (principally by Luong with help from Sid and Sarah). This is critical for real applications because randomization must happen at the point of decision.
  4. Reductions The learning reductions subsystem has continued to mature, although the perfectionist in me is still dissatisfied. As a consequence, it’s now pretty easy to program new reductions, and the efficiency of these reductions has generally improved. Several news ones are cooking.
  5. Online Learning Alekh added an online SVM implementation based on LaSVM. This is known to parallelize well via the para-active approach.

This project has grown quite a bit—there are about 30 different people contributing to VW since the last release, and there is now a VW meetup (December 8th!) in the bay area that I wish I could attend.

Metacademy: a package manager for knowledge

In recent years, there’s been an explosion of free educational resources that make high-level knowledge and skills accessible to an ever-wider group of people. In your own field, you probably have a good idea of where to look for the answer to any particular question. But outside your areas of expertise, sifting through textbooks, Wikipedia articles, research papers, and online lectures can be bewildering (unless you’re fortunate enough to have a knowledgeable colleague to consult). What are the key concepts in the field, how do they relate to each other, which ones should you learn, and where should you learn them?

Courses are a major vehicle for packaging educational materials for a broad audience. The trouble is that they’re typically meant to be consumed linearly, regardless of your specific background or goals. Also, unless thousands of other people have had the same background and learning goals, there may not even be a course that fits your needs. Recently, we (Roger Grosse and Colorado Reed) have been working on Metacademy, an open-source project to make the structure of a field more explicit and help students formulate personal learning plans.

Metacademy is built around an interconnected web of concepts, each one annotated with a short description, a set of learning goals, a (very rough) time estimate, and pointers to learning resources. The concepts are arranged in a prerequisite graph, which is used to generate a learning plan for a concept. In this way, Metacademy serves as a sort of “package manager for knowledge.”

Currently, most of our content is related to machine learning and probabilistic AI; for instance, here are the learning plan and graph for deep belief nets.

Metacademy also has wiki-like documents called roadmaps, which briefly overview key concepts in a field and explain why you might want to learn about them; here’s one we wrote for Bayesian machine learning.

Many ingredients of Metacademy are drawn from pre-existing systems, including Khan Academy, saylor.org, Connexions, and many intelligent tutoring systems. We’re not trying to be the first to do any particular thing; rather, we’re trying to build a tool that we personally wanted to exist, and we hope others will find it useful as well.

Granted, if you’re reading this blog, you probably have a decent grasp of most of the concepts we’ve annotated. So how can Metacademy help you? If you’re teaching an applied course and don’t want to re-explain Gibbs sampling, you can simply point your students to the concept on Metacademy. Or, if you’re writing a textbook or teaching a MOOC, Metacademy can help potential students find their way there. Don’t worry about self-promotion: if you’ve written something you think people will find useful, feel free to add a pointer!

We are hoping to expand the content beyond machine learning, and we welcome contributions. You can create a roadmap to help people find their way around a field. We are currently working on a GUI for editing the concepts and the graph connecting them (our current system is based on Github pull requests), and we’ll send an email to our registered users once this system is online. If you find Metacademy useful or want to contribute, let us know at feedback _at_ metacademy _dot_ org.

The Large Scale Learning class notes

The large scale machine learning class I taught with Yann LeCun has finished. As I expected, it took quite a bit of time :-). We had about 25 people attending in person on average and 400 regularly watching the recorded lectures which is substantially more sustained interest than I expected for an advanced ML class. We also had some fun with class projects—I’m hopeful that several will eventually turn into papers.

I expect there are a number of professors interested in lecturing on this and related topics. Everyone will have their personal taste in subjects of course, but hopefully there will be some convergence to common course materials as well. To help with this, I am making the sources to my presentations available. Feel free to use/improve/embelish/ridicule/etc… in the pursuit of the perfect course.