January 2009 – Page 2 – Machine Learning (Theory)

Carla Vicens and Eric Siegel contacted me about Predictive Analytics World in San Francisco February 18&19, which I wasn’t familiar with. A quick look at the agenda reveals several people I know working on applications of machine learning in businesses, covering deployed applications topics. It’s interesting to see a business-focused machine learning conference, as it says that we are succeeding as a field. If you are interested in deployed applications, you might attend.

Eric and I did a quick interview by email.

John >
I’ve mostly published and participated in academic machine learning conferences like ICML, COLT, and NIPS. When I look at the set of speakers and subjects for your conference I think “machine learning for business”. Is that your understanding of things? What I’m trying to ask is: what do you view as the primary goal for this conference?

Eric >
You got it. This is the business event focused on the commercial deployment of technology developed at the research conferences you named. Academics’ term, “machine learning,” is essentially synonymous with the business world’s “predictive modeling”. Predictive Analytics World focuses on business applications of this technology, such as response modeling, churn modeling, email targeting, product recommendations, insurance pricing, and credit scoring. PAW’s goal is to strengthen the business impact delivered by predictive analytics deployment, and establish new opportunities with predictive analytics. The conference delivers case studies, expertise and resources to this end.

The conference program is designed to speak the language of marketing and business professionals using or planning to use predictive analytics to solve business challenges — but for the hands-on practitioner or analytical expert focused on commercial deployment who wishes to speak this same language, it’s an equally valuable event.

John >
People at academic conferences would hope that technology developed there can transfer into business use. In your experience, does this happen? And how fast or difficult is it?

Eric >
The best way to catalyze commercial deployment is to show the people it really works outside “the lab” – which is why PAW’s program is packed primarily with named case studies of commercial deployment. These success stories answer your question with a resounding “yes” that the core technology developed academically is indeed put to use.

But predictive analytics has not yet been broadly adopted across all industries, although success stories show at least initial reach in each vertical. So, sure, as one who previously wore a researcher’s hat, commercial deployment can feel slow; having solved the hardest theoretical, mathematical or statistical problems, the rest should go smoothly, right? Not exactly. The main challenges come in ramping up the business “consumer” on the technology so they see its value, positioning the technology in a way that provides business value, and, on the integration side, in preparing corporate data for predictive modeling (that’s a doozy!) and in integrating predict scores into existing systems and processes. These things take time.

John >
Sometimes people working in the academic world don’t have a good understanding of what the real problems are. Do you have a sense of which areas of research are underemphasized in the academic world?

Eric >
To reach commercial success in deploying predictive analytics for the business applications I listed above, the main challenges are on the process and non-analytical integration side, rather than core machine learning technology; its good enough. So, I don’t consider there to be glaring ommissions in the capabilities of core machine learning (I taught the machine learning graduate course at Columbia University and still consider Tom Mitchell’s textbook to be my bible).

On the other hand, there are always places where “real-world” data is going to bring researchers’ attention to as-yet-unsolved problems. A perfect example is the Netflix Prize, the current leader of which (and winner of the recent Progress Prize) will be speaking at PAW-09 – see here.

Several talks seem potentially interesting to ML folks at this year’s SODA.

Maria-Florina Balcan, Avrim Blum, and Anupam Gupta, Approximate Clustering without the Approximation. This paper gives reasonable algorithms with provable approximation guarantees for k-median and other notions of clustering. It’s conceptually interesting, because it’s the second example I’ve seen where NP hardness is subverted by changing the problem definition subtle but reasonable way. Essentially, they show that if any near-approximation to an optimal solution is good, then it’s computationally easy to find a near-optimal solution. This subtle shift bears serious thought. A similar one occurred in our ranking paper with respect to minimum feedback arcset. With two known examples, it suggests that many more NP-complete problems might be finessed into irrelevance in this style.
Yury Lifshits and Shengyu Zhang, Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates, and Small-World Design. The basic idea of this paper is that actually creating a metric with a valid triangle inequality inequality is hard for real-world problems, so it’s desirable to have a datastructure which works with a relaxed notion of triangle inequality. The precise relaxation is more extreme than you might imagine, implying the associated algorithms give substantial potential speedups in incomparable applications. Yuri tells me that a cover tree style “true O(n) space” algorithm is possible. If worked out and implemented, I could imagine substantial use.
Elad Hazan and Satyen Kale Better Algorithms for Benign Bandits. The basic idea of this paper is that in real-world applications, an adversary is less powerful than is commonly supposed, so carefully taking into account the observed variance can yield an algorithm which works much better in practice, without sacrificing the worst case performance.
Kevin Matulef, Ryan O’Donnell, Ronitt Rubinfeld, Rocco Servedio, Testing Halfspaces. The basic point of this paper is that testing halfspaces is qualitatively easier than finding a good half space with respect to 0/1 loss. Although the analysis is laughably far from practical, the result is striking, and it’s plausible that the algorithm works much better than the analysis. The core algorithm is at least conceptually simple: test that two correlated random points have the same sign, with “yes” being evidence of a halfspace and “no” not.
I also particularly liked Yuval Peres‘s invited talk The Unreasonable Effectiveness of Martingales. Martingale’s are endemic to learning, especially online learning, and I suspect we can tighten and clarify several arguments using some of the techniques discussed.

Month: January 2009

Predictive Analytics World

Interesting Papers at SODA 2009