ICML 2017 Tutorial on Real World Interactive Learning

ICML 2017 Tutorial on Real World Interactive Learning

Alekh Agarwal and John Langford

Tutorial Description

This is a tutorial about practical real-world use of interactive and online learning. It should be of wide interest because this is a nascent technology with broad applications. The tutorial focuses on contextual bandit learning which now has a dozen (at least) practical applications across the world ranging from recommendation tasks and ad-display, to clinical trials and adaptive decision making in computer systems. Unlike a previous tutorial we gave at NIPS in 2013, the primary focus of this tutorial is how to make things work in practice, with some background on the basics [Auer et al., 2002, Langford and Zhang, 2007, Beygelzimer et al., 2011, Chu et al., 2011, Dudík et al., 2011a, Agarwal et al., 2014].

The broad goal of this tutorial will be two fold:

  1. Novel design challenges: Many machine learning systems consist of training a model on some static dataset, and then just deploying the learned model in an application. In contrast, the underlying models in a contextual bandit system continually learn from the feedback acquired. This means that the machine learning system needs to handle data flow, logging and real-time learning leading to several new challenges compared to the standard supervised learning paradigm. Our tutorial discusses these challenges, their framing, and practical soluitons verified by experience.
  2. Recipes for practical success: In many application scenarios, there are multiple design choices as well as discrepancies between the question of interest and the setup in theory. Different ways of mapping the same problem into a candidate solution can often result in very different outcomes, as in any machine learning setting. We cover various canonical problem settings that arise in applications as well as recipes for handling them as a part of this tutorial.

Goals At the end of this tutorial, participants should have both a firm understanding of the foundations of contextual bandit learning and the practical ability to deploy and use one in an hour.

Target Audience: We are targeting both machine learning researchers who may not have encountered the practical issues with doing interactive learning and people from industry who would really like to use these techniques to solve their problems.


Slides for motivation, algorithms and theory, War Stories, Systems, and other Issues, and a Bibliography as well as Video


Alekh Agarwal has done research on theoretical aspects of contextual bandit learning for several years, including the best practical approaches [Agarwal et al., 2014, 2012, Krishnamurthy et al., 2015]. He has also been involved in the practical applications of contextual bandits at Microsoft, including the development of an easily deployable service (http://aka.ms/mwt).

John Langford coined the name contextual bandits Langford and Zhang [2007] to describe a tractable subset of reinforcement learning and has worked on a half-dozen papers Beygelzimer et al. [2011], Dudík et al. [2011a,b], Agarwal et al. [2014, 2012], Beygelzimer and Langford [2009], Li et al. [2010] improving our understanding of how to learn in this paradigm. John has also given several tutorials previously on topics such as Joint Prediction (ICML 2015), Contextual Bandit Theory (NIPS 2013), Active Learning (ICML 2009), and Sample Complexity Bounds (ICML 2003).


Alekh Agarwal, Miroslav Dudík, Satyen Kale, John Langford, and Robert E. Schapire. Contextual bandit learning with predictable rewards. In AISTATS, 2012.

Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. In ICML, 2014.

Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal of Computing, 32(1):48–77, 2002.

Alina Beygelzimer and John Langford. The offset tree for learning with partial labels. In KDD, 2009.

Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In AISTATS, 2011.

Wei Chu, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandits with linear payoff functions. In AISTATS, 2011.

Miroslav Dudík, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang. Efficient optimal learning for contextual bandits. In UAI, 2011a.

Miroslav Dudík, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In ICML,2011b.

Akshay Krishnamurthy, Alekh Agarwal, and Miroslav Dudík. Efficient contextual semi-bandit learning. arXiv preprint arXiv:1502.05890, 2015.

John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In NIPS, 2007.

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, 2010.


  • ICML 2017
  • Where: Cockle Bay, International Convention Center, Sydney, Australia
  • When: Sunday, August 6, 3:45-6pm.