Graduates – Machine Learning (Theory)

The last several years have seen a phenomonal growth in machine learning, such that this earlier post from 2007 is understated. Machine learning jobs aren’t just growing on trees, they are growing everywhere. The core dynamic is a digitizing world, which makes people who know how to use data effectively a very hot commodity. In the present state, anyone reasonably familiar with some machine learning tools and a master’s level of education can get a good job at many companies while Phd students coming out sometimes have bidding wars and many professors have created startups.

Despite this, hiring in good research positions can be challenging. A good research position is one where you can:

Spend the majority of your time working on research questions that interest.
Work with other like-minded people.
For several years.

I see these as critical—research is hard enough that you cannot expect to succeed without devoting the majority of your time. You cannot hope to succeed without personal interest. Other like-minded people are typically necessary in finding the solutions of the hardest problems. And, typically you must work for several years before seeing significant success. There are exceptions to everything, but these criteria are the working norm of successful research I see.

The set of good research positions is expanding, but at a much slower pace than the many applied scientist types of positions. This makes good sense as the pool of people able to do interesting research grows only slowly, and anyone funding this should think quite hard before making the necessary expensive commitment for success.

But, with the above said, what makes a good candidate for a research position? People have many diverse preferences, so I can only speak for myself with any authority. There are several things I do and don’t look for.

Something new. Any good candidate should have something worth teaching. For a phd candidate, the subject of your research is deeply dependent on your advisor. It is not necessary that you do something different from your advisor’s research direction, but it is necessary that you own (and can speak authoritatively) about a significant advance.
Something other than papers. It is quite possible to persist indefinitely in academia while only writing papers, but it does not show a real interest in what you are doing beyond survival. Why are you doing it? What is the purpose? Some people code. Some people solve particular applications. There are other things as well, but these make the difference.
A difficult long-term goal. A goal suggests interest, but more importantly it makes research accumulate. Some people do research without a goal, solving whatever problems happen to pass by that they can solve. Very smart people can do well in research careers with a random walk amongst research problems. But people with a goal can have their research accumulate in a much stronger fashion than a random walk through research problems. I’m not an extremist here—solving off goal problems is fine and desirable, but having a long-term goal makes a long-term difference.
A portfolio of coauthors. This shows that you are the sort of person able to and interested in working with other people, as is very often necessary for success. This can be particularly difficult for some phd candidates whose advisors expect them to work exclusively with (or for) them. Summer internships are both a strong tradition and a great opportunity here.
I rarely trust recommendations, because I find them very difficult to interpret. When the candidate selects the writers, the most interesting bit is who the writers are. Letters default positive, but the degree of default varies from writer to writer. Occasionally, a recommendation says something surprising, but do you trust the recommender’s judgement? In some cases yes, but in many cases you do not know the writer.

Meeting the above criteria within the context of a phd is extraordinarily difficult. The good news is that you can “fail” with a job that is better in just about every way 🙂

Anytime criteria are discussed, it’s worth asking: should you optimize for them? In another context, Lines of code is a terrible metric to optimize when judging programmer productivity. Here, I believe optimizing for (1), (2), (3), and (4) are all beneficial and worthwhile for phd students.

I would like to point out 3 graduates this season as having my confidence they are capable of doing great things.

Daniel Hsu has diverse papers with diverse coauthors on {active learning, mulitlabeling, temporal learning, …} each covering new algorithms and methods of analysis. He is also a capable programmer, having helped me with some nitty-gritty details of cluster parallel Vowpal Wabbit this summer. He has an excellent tendency to just get things done.
Nicolas Lambert doesn’t nominally work in machine learning, but I’ve found his work in elicitation relevant nevertheless. In essence, elicitable properties are closely related to learnable properties, and the elicitation complexity is related to a notion of learning complexity. See the Surrogate regret bounds paper for some related discussion. Few people successfully work at such a general level that it crosses fields, but he’s one of them.
Yisong Yue is deeply focused on interactive learning, which he has attacked at all levels: theory, algorithm adaptation, programming, and popular description. I’ve seen a relentless multidimensional focus on a new real-world problem be an excellent strategy for research and expect he’ll succeed.

The obvious caveat applies—I don’t know or haven’t fully appreciated everyone’s work so I’m sure I missed people. I’d like to particularly point out Percy Liang and David Sontag as plausibly such whom I’m sure others appreciate a great deal.

Category: Graduates

The perfect candidate

Top graduates this season