Machine Learning (Theory)

7/6/2018

A Real World Reinforcement Learning Research Program

Tags: Funding,Reinforcement,Research jl@ 9:10 am

We are hiring for reinforcement learning related research at all levels and all MSR labs. If you are interested, apply, talk to me at COLT or ICML, or email me.

More generally though, I wanted to lay out a philosophy of research which differs from (and plausibly improves on) the current prevailing mode.

Deepmind and OpenAI have popularized an empirical approach where researchers modify algorithms and test them against simulated environments, including in self-play. They’ve achieved significant success in these simulated environments, greatly expanding the reportoire of ‘games solved by reinforcement learning’ which consisted of the singleton backgammon when I was a graduate student. Given the ambitious goals of these organizations, the more general plan seems to be “first solve games, then solve real problems”. There are some weaknesses to this approach, which I want to lay out next.

  • Broken API One issue with this is that multi-step reinforcement learning is a broken API in the sense that it creates an interface for problem definitions that is unsolvable via currently popular algorithm families. In particular, you can create problems which are either ‘antishaped’ so local rewards mislead w.r.t. long term rewards or keylock problems, as are common in Markov Decision Process lower bounds. I coded up simple versions of these problems a couple years ago and stuck them on github now to be extra crisp. If you try to apply policy gradient or Q-learning style algorithms on these problems they commonly run into exponential (in the number of states) sample complexity. As a general principle, APIs which create exponential sample complexity are bad—they imply that individual applications require taking advantage of special structure in order to succeed.
  • Transference Another significant issue is the degree of transference between solutions in simulation and the real world. “Transference” here potentially happens at several levels.
    • Do the algorithms carry over? One of the persistent issues with simulation-based approaches is that you don’t care about sample complexity that much—optimal performance at acceptable computational complexities is the typical goal. In real world applications, this is somewhat absurd—you really care about immediately doing something reasonable and optimizing from there.
    • Do the simulators carry over? For every simulator, there is a fidelity question which comes into play when you want to transfer a policy learned in the simulator into action in the real world. Real-time ray tracing and simulator quality more generally are advancing, but I’m not ready yet to trust a self-driving care trained in a simulated reality. An accurate simulation of the physics is unclear—friction for example is known-difficult, and more generally the representative variety of exogenous events in an open world seems quite difficult to implement.
  • Solution generality When you test and discover that an algorithm works in a simulated world, you know that it works in the simulated world. If you try it in 30 simulated worlds and it works in all of them, it can still easily be the case that an algorithm fails on the 31st simulated world. How can you achieve confidence beyond the number of simulated worlds that you try and succeed on? There is some sense by which you can imagine generalization over an underlying process generating problems, but this seems like a shaky justification in practice, since the nature of the problems encountered seems to be a nonstationary development of an unknown future.
  • Value creation Solutions of a ‘first A, then B’ flavor naturally take time to get to the end state where most of the real value is set to be realized. In the years before reaching applications in the real world, does the funding run out? We certainly hope not for the field of research but a danger does exist. Some discussion here including the comments is relevant.

What’s an alternative?

Each of the issues above is addressable.

  • Build fundamental theories of what are statistically and computationally tractable sub-problems of Reinforcement Learning. These tractable sub-problems form the ‘APIs’ of systems for solving these problems. Examples of this include simpler (Contextual Bandits), intermediate (learning to search, and move advanced (Contextual Decision Process).
  • Work on real-world problems. The obvious antidote to simulation is reality, driving both the need to create systems that work in reality as well as a research agenda around reality-centered issues like performance at low sample complexity. There are some significant difficulties with this—reinforcement style algorithms require interactive access to learn which often drives research towards companies with an infrastructure. Nevertheless, offline evaluation on real-world data does exist and the choice of emphasis in research directions is universal.
  • The combination of fundamental theories and a platform which distills learnings so they are not forgotten and always improved upon provides a stronger basis for expectation of generalization into the next problem.
  • The shortest path to creating valuable applications in the real world is to simply work on creating valuable applications in the real world. Doing this in a manner guided by other elements of the research program is just good sense.

The above must be applied in moderation—some emphasis on theory, some emphasis on real world applications, some emphasis on platforms, and some emphasis on empirics. This has been my research approach for a little over 10 years, ever since I started working on contextual bandits.

Let’s call the first research program ’empirical simulation’ and the second research program ‘real fundamentals’. The empirical simulation approach has a clear strong advantage in that it creates impressive demos, which creates funding, which creates more research. The threshold for contribution to the empirical simulation approach may also be lower simply because it requires mastery of fewer elements, implying people can more easily participate in it. At the same time, the real fundamentals approach has clear advantages in addressing the weaknesses of the empirical simulation approach. At a concrete level, this means we have managed to define and create fundamentals through research while creating real-world applications and value radically more efficiently than the empirical simulation approach has achieved.

The ‘real fundamentals’ concept is behind the open positions above. These positions have been designed to come with both the colleagues and mandate to address the most difficult research problems along with the organizational leverage to change the world. For people interested in fundamentals and making things happen in the real world these are prime positions—please consider joining us.

4/22/2017

Fact over Fiction

Tags: politics,Research jl@ 1:23 pm

Politics is a distracting affair which I generally believe it’s best to stay out of if you want to be able to concentrate on research. Nevertheless, the US presidential election looks like something that directly politicizes the idea and process of research by damaging the association of scientists & students, funding for basic research, and creating political censorship.

A core question here is: What to do? Today’s March for Science is a good step, but I’m not sure it will change many minds. Unlike most scientists, I grew up in a a county (Linn) which voted overwhelmingly for Trump. As a consequence, I feel like I must translate the mindset a bit. For the median household left behind over my lifetime a march by relatively affluent people protesting the government cutting expenses will not elicit much sympathy. Discussion about the overwhelming value of science may also fall on deaf ears simply because they have not seen the economic value personally. On the contrary, they have seen their economic situation flat or worsening for 4 decades with little prospect for things getting better. Similarly, I don’t expect history lessons on anti-intellectualism to make much of a dent. Fundamentally, scientists and science fans are a small fraction of the population.

What’s needed is a campaign that achieves broad agreement across the population and which will help. One of the roots of the March for Science is a belief in facts over fiction which may have the requisite scope. In particular, there seems to be a good case that the right to engage in mass disinformation has been enormously costly to the United States and is now a significant threat to civil stability. Internally, disinformation is a preferred tool for starting wars or for wealthy companies to push a deadly business model. Externally, disinformation is now being actively used to sway elections and is self-funding.

The election outcome is actually less important than the endemic disagreement that disinformation creates. When people simply believe in different facts about the world how can you expect them to agree? There probably are some good uses of mass disinformation somewhere, but I’m extremely skeptical the value exceeds the cost.

Is opposition to mass disinformation broad enough that it makes a good organizing principle? If mass disinformation was eliminated or greatly reduced it would be an enormous value to society, particularly to the disinformed. It would not address the fundamental economic stagnation of the median household in the United States, but it would remove a significant threat to civil society which may be necessary for such progress. Given a choice between the right to mass disinform and democracy, I choose democracy.

A real question is “how”? We are discussing an abridgment of freedom of speech so from a legal perspective the basis must rest on the balance between freedom of speech and other constitutional rights. Many abridgements exist like censuring a yell of “fire” in a crowded theater unnecessarily.

Voluntary efforts (as Facebook and Twitter have undertaken) are a start, but it seems unlikely to go far enough as many other “news” organizations have made no such commitments. A system where companies commit to informing over disinforming and in return become both more trusted and simultaneously liable for disinformation damages (due to the disinformed) as assessed by civil law may make sense. Right now organizations are mostly free to engage in disinformation as long as it is not directed at an individual where libel laws apply. Penalizing an organization for individual mistakes seems absurd, but a pattern of errors backed by scientific surveys verifying an anomalously misinformed status of viewers/readers/listeners is cause for action. Getting this right is obviously a tricky thing—we want a solution that a real news organization with an existing mimetic immune system prefers to the status quo because it curbs competitors that disinform. At the same time, there must be enough teeth to make disinformation uneconomical or the problem only grows.

Should disinformation have criminal penalties? One existing approach here uses RICO laws to counter disinformation from Tobacco companies. Reading the history, this took an amazing amount of time—enough that it was ineffective for a generation. It seems plausible that an act directly addressing disinformation may be helpful.

What about technical solutions? Technical solutions seem necessary for success, perhaps with changes to law incentivizing this. It’s important to understand that something going before the courts is inherently slow, particularly because courts tend to be deeply overloaded. A significant increase in the number of cases going before courts makes an approach nonviable in practice.

Would we regret this? There is a long history of governments abusing laws to censor inconvenient news sources so caution is warranted. Structuring new laws in a manner such that they cannot be abused is an important consideration. It is obviously important to leave satire fully intact which seems entirely possibly by making the fact that it is satire unmistakable. This entire discussion is also not relevant to individuals speaking to other individuals—that is not what creates a problem.

Is this possible? It might seem obvious that mass disinformation should be curbed but there should be no doubt that powerful forces will work to preserve mass disinformation by subtle and unethical means.

Overall, I fundamentally believe that people in a position to inform or disinform have a responsibility to inform. If they don’t want that responsibility, then they should abdicate the position to someone who does, similar in effect to the proposed fiduciary rule for investments. I’m open to any approach towards achieving this.

Edit: also at CACM.

9/19/2014

No more MSR Silicon Valley

Tags: Research jl@ 6:44 pm

This news report is correct, the Microsoft Research Silicon Valley center has been cut. The New York lab has not been directly affected although obviously cross-lab collaborations are impacted, and we sympathize deeply with those involved. Most of the rest of MSR is not directly affected.

I’m not privy to the rationale behind the decision, but in my opinion there are some very strong people in the various groups (Algorithms, Architecture, Distributed Systems, Privacy, Software tools, Web Search), and I expect offers have started raining on them. In my experience, this is harrowing in the short term, yet I know that most of my previous colleagues ended up happier after the troubles hit Yahoo! Research 2 1/2 years ago.

7/14/2014

The perfect candidate

The last several years have seen a phenomonal growth in machine learning, such that this earlier post from 2007 is understated. Machine learning jobs aren’t just growing on trees, they are growing everywhere. The core dynamic is a digitizing world, which makes people who know how to use data effectively a very hot commodity. In the present state, anyone reasonably familiar with some machine learning tools and a master’s level of education can get a good job at many companies while Phd students coming out sometimes have bidding wars and many professors have created startups.

Despite this, hiring in good research positions can be challenging. A good research position is one where you can:

  • Spend the majority of your time working on research questions that interest.
  • Work with other like-minded people.
  • For several years.

I see these as critical—research is hard enough that you cannot expect to succeed without devoting the majority of your time. You cannot hope to succeed without personal interest. Other like-minded people are typically necessary in finding the solutions of the hardest problems. And, typically you must work for several years before seeing significant success. There are exceptions to everything, but these criteria are the working norm of successful research I see.

The set of good research positions is expanding, but at a much slower pace than the many applied scientist types of positions. This makes good sense as the pool of people able to do interesting research grows only slowly, and anyone funding this should think quite hard before making the necessary expensive commitment for success.

But, with the above said, what makes a good candidate for a research position? People have many diverse preferences, so I can only speak for myself with any authority. There are several things I do and don’t look for.

  1. Something new. Any good candidate should have something worth teaching. For a phd candidate, the subject of your research is deeply dependent on your advisor. It is not necessary that you do something different from your advisor’s research direction, but it is necessary that you own (and can speak authoritatively) about a significant advance.
  2. Something other than papers. It is quite possible to persist indefinitely in academia while only writing papers, but it does not show a real interest in what you are doing beyond survival. Why are you doing it? What is the purpose? Some people code. Some people solve particular applications. There are other things as well, but these make the difference.
  3. A difficult long-term goal. A goal suggests interest, but more importantly it makes research accumulate. Some people do research without a goal, solving whatever problems happen to pass by that they can solve. Very smart people can do well in research careers with a random walk amongst research problems. But people with a goal can have their research accumulate in a much stronger fashion than a random walk through research problems. I’m not an extremist here—solving off goal problems is fine and desirable, but having a long-term goal makes a long-term difference.
  4. A portfolio of coauthors. This shows that you are the sort of person able to and interested in working with other people, as is very often necessary for success. This can be particularly difficult for some phd candidates whose advisors expect them to work exclusively with (or for) them. Summer internships are both a strong tradition and a great opportunity here.
  5. I rarely trust recommendations, because I find them very difficult to interpret. When the candidate selects the writers, the most interesting bit is who the writers are. Letters default positive, but the degree of default varies from writer to writer. Occasionally, a recommendation says something surprising, but do you trust the recommender’s judgement? In some cases yes, but in many cases you do not know the writer.

Meeting the above criteria within the context of a phd is extraordinarily difficult. The good news is that you can “fail” with a job that is better in just about every way :-)

Anytime criteria are discussed, it’s worth asking: should you optimize for them? In another context, Lines of code is a terrible metric to optimize when judging programmer productivity. Here, I believe optimizing for (1), (2), (3), and (4) are all beneficial and worthwhile for phd students.

5/3/2012

Microsoft Research, New York City

Tags: Machine Learning,Research jl@ 5:34 am

Yahoo! laid off people. Unlike every previous time there have been layoffs, this is serious for Yahoo! Research.

We had advanced warning from Prabhakar through the simple act of leaving. Yahoo! Research was a world class organization that Prabhakar recruited much of personally, so it is deeply implausible that he would spontaneously decide to leave. My first thought when I saw the news was “Uhoh, Rob said that he knew it was serious when the head of ATnT Research left.” In this case it was even more significant, because Prabhakar recruited me on the premise that Y!R was an experiment in how research should be done: via a combination of high quality people and high engagement with the company. Prabhakar’s departure is a clear end to that experiment.

The result is ambiguous from a business perspective. Y!R clearly was not capable of saving the company from its illnesses. I’m not privy to the internal accounting of impact and this is the kind of subject where there can easily be great disagreement. Even so, there were several strong direct impacts coming from the machine learning, economics, and algorithms groups.

Y!R clearly was excellent from an academic research perspective. On a per person basis in relevant subjects, it was outstanding. One way to measure this is by noticing that both ICML and KDD had (co)program chairs from Y!R. It turns out that talking to the rest of the organization doing consulting, architecting, and prototyping on a minority basis helps research by sharpening the questions you ask more than it hinders by taking up time. The decision to participate in this experiment was a good one for me personally.

It has been clear in silicon valley, academia, and pretty much everywhere else that people at Yahoo! including Yahoo! Research have been looking around for new positions. Maintaining the excellence of Y!R in a company that has been under prolonged stress was challenging leadership-wise. Consequently, the abrupt departure of Prabhakar and an apparent lack of appreciation by the new CEO created a crisis of confidence. Many people who were sitting on strong offers quickly left, and everyone else started looking around.

In this situation, my first concern was for colleagues, both in Machine Learning across the company and the Yahoo! Research New York office.

Machine Learning turns out to be a very hot technology. Every company and government in the world is drowning in data, and Machine Learning is the prime tool for actually using it to do interesting things. More generally, the demand for high quality seasoned machine learning researchers across startups, mature companies, government labs, and academia has been astonishing, and I expect the outcome to reflect that. This is remarkably different from the cuts that hit ATnT research in late 2001 and early 2002 where the famous machine learning group there took many months to disperse to new positions.

In the New York office, we investigated many possibilities hard enough that it became a news story. While that article is wrong in specifics (we ended up not fired for example, although it is difficult to discern cause and effect), we certainly shook the job tree very hard to see what would fall out. To my surprise, amongst all the companies we investigated, Microsoft had a uniquely sufficient agility, breadth of interest, and technical culture, enabling them to make offers that I and a significant fraction of the Y!R-NY lab could not resist. My belief is that the new Microsoft Research New York City lab will become an even greater techhouse than Y!R-NY. At a personal level, it is deeply flattering that they have chosen to create a lab for us on short notice. I will certainly do my part chasing the greatest learning algorithms not yet invented.

In light of this, I would encourage people in academia to consider Yahoo! in as fair a light as possible in the current circumstances. There are and will be some serious hard feelings about the outcome as various top researchers elsewhere in the organization feel compelled to look for jobs and leave. However, Yahoo! took a real gamble supporting a research organization about 7 years ago, and many positive things have come of this gamble from all perspectives. I expect almost all of the people leaving to eventually do quite well, and often even better.

What about ICML? My second thought on hearing about Prabhakar’s departure was “I really need to finish up initial paper/reviewer assignments today before dealing with this”. During the reviewing period where the program chair load is relatively light, Joelle handled nearly everything. My great distraction ended neatly in time to help with decisions at ICML. I considered all possibilities in accepting the job and was prepared to simply put aside a job search for some time if necessary, but the timing was surreally perfect. All signs so far point towards this ICML being an exceptional ICML, and I plan to do everything that I can to make that happen. The early registration deadline is May 13.

What about KDD? Deepak was sitting on an offer at Linkedin and simply took it, so the disruption there was even more minimal. Linkedin is a significant surprise winner in this affair.

What about Vowpal Wabbit? Amongst other things, VW is the ultrascale learning algorithm, not the kind of thing that you would want to put aside lightly. I negotiated to continue the project and succeeded. This surprised me greatly—Microsoft has made serious commitments to supporting open source in various ways and that commitment is what sealed the deal for me. In return, I would like to see Microsoft always at or beyond the cutting edge in machine learning technology.

added: crosspost on CACM.
added: Lance, Jennifer, NYTimes, Vader

Older Posts »

Powered by WordPress