Machined Learnings

Paul Mineiro has started Machined Learnings where he’s seriously attempting to do ML research in public. I personally need to read through in greater detail, as much of it is learning reduction related, trying to deal with the sorts of complex source problems that come up in practice.

Regretting the dead

Nikos pointed out this new york times article about poor clinical design killing people. For those of us who study learning from exploration information this is a reminder that low regret algorithms are particularly important, as regret in clinical trials is measured by patient deaths.

Two obvious improvements on the experimental design are:

  1. With reasonable record keeping of existing outcomes for the standard treatments, there is no need to explicitly assign people to a control group with the standard treatment, as that approach is effectively explored with great certainty. Asserting otherwise would imply that the nature of effective treatments for cancer has changed between now and a year ago, which denies the value of any clinical trial.
  2. An optimal experimental design will smoothly phase between exploration and exploitation as evidence for a new treatment shows that it can be effective. This is old tech, for example in the EXP3.P algorithm (page 12 aka 59) although I prefer the generalized and somewhat clearer analysis of EXP4.P.

Done the right way, the clinical trial for a successful treatment would start with some initial small pool (equivalent to “phase 1” in the article) and then simply expanded the pool of participants over time as it proved superior to the existing treatment, until the pool is everyone. And as a bonus, you can even compete with policies on treatments rather than raw treatments (i.e. personalized medicine).

Getting from here to there seems difficult. It’s been 15 years since EXP3.P was first published, and the progress in clinical trial design seems glacial to us outsiders. Partly, I think this is a communication and education failure, but partly, it’s also a failure of imagination within our own field. When we design algorithms, we often don’t think about all the applications, where a little massaging of the design in obvious-to-us ways so as to suit these applications would go a long ways. Getting this right here has a substantial moral aspect, potentially saving millions of lives over time through more precise and fast deployments of new treatments.

Boosted Decision Trees for Deep Learning

About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms.

This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. I’m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance.

Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing what works.

KDD 2010

There were several papers that seemed fairly interesting at KDD this year. The ones that caught my attention are:

  1. Xin Jin, Mingyang Zhang, Nan Zhang, and Gautam Das, Versatile Publishing For Privacy Preservation. This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. It is not differentially private, so no external sources of join information can exist. However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm.
  2. Arik Friedman Assaf Schuster, Data Mining with Differential Privacy. This paper shows how to create effective differentially private decision trees. Progress in differentially private datamining is pretty impressive, as it was defined in 2006.
  3. David Chan, Rong Ge, Ori Gershony, Tim Hesterberg, Diane Lambert, Evaluating Online Ad Campaigns in a Pipeline: Causal Models At Scale This paper is about automated estimation of ad campaign effectiveness. The double robust estimation technique seems intuitively appealing and plausibly greatly enhances effectiveness.
  4. Naoki Abe et al. Optimizing Debt Collections Using Constrained Reinforcement Learning This is an application paper about optimizing the New York State income tax collection agency. As you might expect, there are several cludgy aspects due to working within legal and organizational constraints. They deal with them, and expect to end up making NY state around $108/year. Too bad I live in NY 🙂
  5. Vikas C Raykar, Balaji Krishnapuram, and Shinpeng Yu Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost This paper is about a continuization based solution to designing a cost-efficient yet accurate classifier cascade. It’s a step beyond the Viola Jones style boosting with cutouts, but I suspect not yet a final solution.
  6. D. Sculley, Combined Regression and Ranking. There are lots of applications where you want both a correct ordering and an estimated value of each item. This paper shows a simple combined-loss approach to getting both which empirically improves on either metric.

In addition, I enjoyed Konrad Feldman‘s invited talk on Quantcast‘s data and learning systems which sounded pretty slick.

In general, it seems like KDD is substantially maturing as a conference. The work on empirically effective privacy-preserving algorithms and some of the stats-work is ahead of what I’ve seen at other machine learning conferences. Presumably this is due to KDD being closer to the business side of machine learning and hence more aware of what are real problems there. An annoying aspect of KDD as a publishing venue is that they don’t put the papers on the conference website, due to ACM constraints. A substantial compensation is that all talks are scheduled to appear on videolectures.net and, as you can see, most papers can be found on author webpages.

KDD also experimented with crowdvine again this year so people could announce which talks they were interested in and setup meetings. My impression was that it worked a bit less well than last year, partly because it wasn’t pushed as much by the conference organizers. Small changes in the interface might make a big difference—for example, just providing a ranking of papers by interest might make it pretty compelling.