Machine Learning (Theory)


Conference on Digitial Experimentation

I just attended CODE. The set of people interested in digital experimentation have very diverse backgrounds encompassing theory, machine learning, social science, economics, and industry so this seems like a good subject for a new conference. I hope it continues.

I found several talks interesting.

  • Eytan Bakshy talked about PlanOut which is language/platform for flexibly specifying experiments.
  • Ron Kohavi talked about EXP which is a heavily used A/B testing platform.
  • Susan Athey talked about long term vs short term metrics which seems both important to address, a constant problem, and not yet systematically solved.

There was a panel about the ongoing Facebook experimentation controversy. The issue here is complex. My understanding is that Facebook users have some expected ownership of the content they create, and hence aren’t comfortable with the content being used in unexpected ways. On the other hand, experimentation is so necessary to the functioning of all large modern internet sites that banning it or slowing down the process by a factor of a million (as some advocated) would badly degrade the future of these sites in practice.

My belief is that what’s lacking is education and trust. W.r.t. education, people need to understand that experimentation is unavoidable when trying to figure out how to optimize an enormously complex system, as there is just no other way to systematically make 1000 right decisions as is necessary for basic things like choosing the best homepage/search result/etc… W.r.t. trust, companies are not particularly good at creating trust in general, but finding the right mechanism for doing so seems critical. I would point out Vanguard as a company that managed to successfully create trust by design.


Open Machine Learning Workshop, August 22

On August 22, we are planning to have an Open Machine Learning Workshop at MSR, New York City taking advantage of CJ Lin and others in town for KDD.

If you are interested, please email msrnycrsvp at and say “I want to come” so we can get a count of attendees for refreshments.

Added: Videos are now online.


The perfect candidate

The last several years have seen a phenomonal growth in machine learning, such that this earlier post from 2007 is understated. Machine learning jobs aren’t just growing on trees, they are growing everywhere. The core dynamic is a digitizing world, which makes people who know how to use data effectively a very hot commodity. In the present state, anyone reasonably familiar with some machine learning tools and a master’s level of education can get a good job at many companies while Phd students coming out sometimes have bidding wars and many professors have created startups.

Despite this, hiring in good research positions can be challenging. A good research position is one where you can:

  • Spend the majority of your time working on research questions that interest.
  • Work with other like-minded people.
  • For several years.

I see these as critical—research is hard enough that you cannot expect to succeed without devoting the majority of your time. You cannot hope to succeed without personal interest. Other like-minded people are typically necessary in finding the solutions of the hardest problems. And, typically you must work for several years before seeing significant success. There are exceptions to everything, but these criteria are the working norm of successful research I see.

The set of good research positions is expanding, but at a much slower pace than the many applied scientist types of positions. This makes good sense as the pool of people able to do interesting research grows only slowly, and anyone funding this should think quite hard before making the necessary expensive commitment for success.

But, with the above said, what makes a good candidate for a research position? People have many diverse preferences, so I can only speak for myself with any authority. There are several things I do and don’t look for.

  1. Something new. Any good candidate should have something worth teaching. For a phd candidate, the subject of your research is deeply dependent on your advisor. It is not necessary that you do something different from your advisor’s research direction, but it is necessary that you own (and can speak authoritatively) about a significant advance.
  2. Something other than papers. It is quite possible to persist indefinitely in academia while only writing papers, but it does not show a real interest in what you are doing beyond survival. Why are you doing it? What is the purpose? Some people code. Some people solve particular applications. There are other things as well, but these make the difference.
  3. A difficult long-term goal. A goal suggests interest, but more importantly it makes research accumulate. Some people do research without a goal, solving whatever problems happen to pass by that they can solve. Very smart people can do well in research careers with a random walk amongst research problems. But people with a goal can have their research accumulate in a much stronger fashion than a random walk through research problems. I’m not an extremist here—solving off goal problems is fine and desirable, but having a long-term goal makes a long-term difference.
  4. A portfolio of coauthors. This shows that you are the sort of person able to and interested in working with other people, as is very often necessary for success. This can be particularly difficult for some phd candidates whose advisors expect them to work exclusively with (or for) them. Summer internships are both a strong tradition and a great opportunity here.
  5. I rarely trust recommendations, because I find them very difficult to interpret. When the candidate selects the writers, the most interesting bit is who the writers are. Letters default positive, but the degree of default varies from writer to writer. Occasionally, a recommendation says something surprising, but do you trust the recommender’s judgement? In some cases yes, but in many cases you do not know the writer.

Meeting the above criteria within the context of a phd is extraordinarily difficult. The good news is that you can “fail” with a job that is better in just about every way :-)

Anytime criteria are discussed, it’s worth asking: should you optimize for them? In another context, Lines of code is a terrible metric to optimize when judging programmer productivity. Here, I believe optimizing for (1), (2), (3), and (4) are all beneficial and worthwhile for phd students.


Interesting papers at ICML 2014

This year’s ICML had several papers which I want to read through more carefully and understand better.

  1. Chun-Liang Li, Hsuan-Tien Lin, Condensed Filter Tree for Cost-Sensitive Multi-Label Classification. Several tricks accumulate to give a new approach for addressing cost sensitive multilabel classification.
  2. Nikos Karampatziakis and Paul Mineiro, Discriminative Features via Generalized Eigenvectors. An efficient, effective eigenvalue solution for supervised learning yields compelling nonlinear performance on several datasets.
  3. Nir Ailon, Zohar Karnin, Thorsten Joachims, Reducing Dueling Bandits to Cardinal Bandits. An effective method for reducing dueling bandits to normal bandits that extends to contextual situations.
  4. Pedro Pinheiro, Ronan Collobert, Recurrent Convolutional Neural Networks for Scene Labeling. Image parsing remains a challenge, and this is plausibly a step forward.
  5. Cicero Dos Santos, Bianca Zadrozny, Learning Character-level Representations for Part-of-Speech Tagging. Word morphology is clearly useful information, and yet almost all ML-for-NLP applications ignore it or hard-code it (by stemming).
  6. Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire, Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. Statistically efficient interactive learning is now computationally feasible. I wish this one had been done in time for the NIPS tutorial :-)
  7. David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller, Deterministic Policy Gradient Algorithms. A reduction in variance from working out the deterministic limit of policy gradient make policy gradient approaches look much more attractive.

Edit: added one that I forgot.


The New York ML Symposium, take 2

The 201314 is New York Machine Learning Symposium is finally happening on March 28th at the New York Academy of Science. Every invited speaker interests me personally. They are:

We’ve been somewhat disorganized in advertising this. As a consequence, anyone who has not submitted an abstract but would like to do so may send one directly to me ( title NYASMLS) by Friday March 14. I will forward them to the rest of the committee for consideration.

Older Posts »

Powered by WordPress