Interactive Machine Learning

Interactive machine learning is about doing machine learning in an interactive environment. It includes aspects of Reinforcement Learning and Active Learning, amongst others.

Tutorials

Learning to Interact tutorial at NIPS 2013

A tutorial on Learning through Exploration at ICML and KDD 2010 with the videolecture.

An Active Learning tutorial at ICML 2009 with videolecture.

Papers

No more prenormalization	Stephane Ross, Paul Mineiro, and John Langford, Normalized Online Learning, UAI 2013.	We eliminate the need to prenormalize data before feeding it into an online learning algorithm. And, we reduce the need to fiddle with learning rate parameters.
Easy terafeature learning	Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, and John Langford Reliable Effective Terascale Linear Learning, 2011 and JMLR 2013.	We create a solid linear learning algorithm that works well on real-world terafeature datasets. The result is a breakthrough in scalability, because we can learn faster than data can be shoved through any single machine interface.
More thorough contextual bandit experiments	Lihong Li, Wei Chu, John Langford, Taesup Moon, Xuanhui Wang: Unbiased offline evaluation of contextual-bandit Algorithms with Generalized Linear Models, JMLR 2012	A journal version of the WSDM paper below.
Contextual Bandits + Realizability	Alekh Agarwal, Miroslav Dudik, Satyen Kale, and John Langford, Contextual Bandit Learning Under the Realizability Assumption, AIStat 2012.	We propose a new contextual bandit algorithm under the assumption of realizability: that there exists perfect predictor of expected rewards. In the worst case, this turns out to be useless, but in special cases extremely powerful.
Many experiments with parallel online learning	Daniel Hsu, Nikos Karampatziakis, John Langford, and Alex Smola, Parallel Online Learning, Scaling up Machine Learning (the book), 2011	Experiments and analysis of all the obvious feature-partition based online learning algorithms. Results varied between superior and catastrophic in surprising ways, with no one algorithm the clear winner.
We eat an exponential in contextual bandit learning	Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang, Efficient Optimal Learning for Contextual Bandits, UAI 2011.	We call this the "monster paper". It provides the second known class of contextual bandit algorithms, and reduces the complexity of contextual bandit learning from linear in the number of things competed with to polylog given an optimzation oracle.
Efficient use of exploration info	Miroslav Dudik, John Langford and Lihong Li, Doubly Robust Policy Evaluation and Learning, ICML 2011	This approach appears generally superior to offset trees for learning and substantially approves offline evaluation in exploration settings
Unbiased offline evaluation	Lihong Li, Wei Chu, John Langford, Xuanhui Huang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms WSDM 2011.	Evidence that offline evaluation techniques work for news recommendation
Dealing with Importance weights	Nikos Karampatziakis and John Langford, Importance Weight Aware Gradient Updates, UAI 2011.	A much better method than "multiply gradient by importance weight" that has substantial advantages even when the importance weight is always 1.
Efficient Active Learning	Alina Beygelzimer, Daniel Hsu, John Langford, and Tong Zhang Agnostic Active Learning Without Constraints NIPS 2010.	This is an IWAL-style algorithm that functions as a pure reduction to ERM-style supervised learning. It's quite effective empiricallly. Some slides.
Information-theoretically optimal learning	Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire, An Optimal High Probability Algorithm for the Contextual Bandit Problem AIStat 2011.	We improve the EXP4 algorithm to work with high probability, establishing a better guarantee than exists previously for arbitrary policy classes in the contextual bandit setting.
Implicit Exploration	Alex Strehl, John Langford, Sham Kakade, and Lihong Li Learning from Logged Implicit Exploration Data NIPS 2010.	This papers discusses how to learn from implicit exploration, which may be nonrandom. It removes one of the fundamental obstacles to learning with exploration.
Contextual Bandits for News	Lihong Li, Wei Chu, John Langford, Robert Schapire, A Contextual-Bandit Approach to Personalized News Article Recommendation WWW 2010	This is about a contextual bandit approach to news article recommendation.
Delay in Parallel Learning	John Langford, Alex Smola, Martin Zinkevich. Slow Learners are Fast NIPS 2009	This paper discusses how delay is inevitable in parallel online learning environments and analyzes the effect of delay in a linear setting.
Learning Nonlinear Dynamic Models	John Langford, Ruslan Salakhutdinov, and Tong Zhang, Learning Nonlinear Dynamic Models, ICML 2009.	This paper proposes, evaluates, and proves consistent a new approach to dealing with nonlinear dynamic models based on employing learning algorithms to form predictive states.
Sparse Online Learning	John Langford, Lihong Li, and Tong Zhang, Sparse Online Learning via Truncated Gradient, NIPS 2008 Also see the appendix or the full JMLR 2009 draft.	This paper analyzes simple methods for inducing sparsity in online learning and proposes the truncated gradient as the best approach.	See also this paper which proposes a less efficient but more exact method.
Importance Weighted Active Learning	Alina Beygelzimer, Sanjoy Dasgupta, and John Langford, Importance Weighted Active Learning, ICML 2009 some code	We create active learning algorithms using importance weights with finite label complexity guarantees and good empirical performance.
Exploration Scavenging	John Langford, Alexander Strehl, and Jennifer Wortman Exploration Scavenging ICML 2008	This is about how to make use of nonrandom accidental exploration in evaluating and optimizing new policies.
Contextual Bandit->Classification	Alina Beygelzimer and John Langford The Offset Tree for Learning with Partial Labels KDD 2009.	This paper reduces contextual bandit learning to binary classification. It also proves a matching lower bound on contextual bandit learning suggesting this problem is fundamentally harder.
Contextual Bandit Learning	John Langford and Tong Zhang The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits NIPS 2007 .tex	Adding side information to bandits creates a new (relatively unanalyzed) setting. This paper analyzes the first practical algorithm in that setting.
Agnostic Active Learning	Nina Balcan, Alina Beygelzimer, John Langford Agnostic Active Learning ICML 2006 .tex Video lecture also, JCSS Volume 75, issue 1, Pages 78-89 version	We show that active learning (the selective sampling version) is possible in situations with adversarially-placed noice.	See also this one which makes the algorithm more algorithmic and Steve's paper characterizing when speedups are possible.
Perfect Online Learning	Jacob Abernethy, John Langford, Manfred K. Warmuth Continuous Experts and the Binning Algorithm COLT 2006 .ps.gz .pdf .tex	This paper presents and proves a computationally tractable perfectly optimal online learning algorithm "binning" in the familiar experts setting for classification.