Interactive Machine Learning

Interactive machine learning is about doing machine learning in an interactive environment. It includes aspects of Reinforcement Learning and Active Learning, amongst others.

Tutorials

Learning to Interact tutorial at NIPS 2013

A tutorial on Learning through Exploration at ICML and KDD 2010 with the videolecture.

An Active Learning tutorial at ICML 2009 with videolecture.

Papers

No more prenormalizationStephane Ross, Paul Mineiro, and John Langford, Normalized Online Learning, UAI 2013.We eliminate the need to prenormalize data before feeding it into an online learning algorithm. And, we reduce the need to fiddle with learning rate parameters.
Easy terafeature learningAlekh Agarwal, Olivier Chapelle, Miroslav Dudik, and John Langford Reliable Effective Terascale Linear Learning, 2011 and JMLR 2013.We create a solid linear learning algorithm that works well on real-world terafeature datasets. The result is a breakthrough in scalability, because we can learn faster than data can be shoved through any single machine interface.
More thorough contextual bandit experimentsLihong Li, Wei Chu, John Langford, Taesup Moon, Xuanhui Wang: Unbiased offline evaluation of contextual-bandit Algorithms with Generalized Linear Models, JMLR 2012A journal version of the WSDM paper below.
Contextual Bandits + RealizabilityAlekh Agarwal, Miroslav Dudik, Satyen Kale, and John Langford, Contextual Bandit Learning Under the Realizability Assumption, AIStat 2012.We propose a new contextual bandit algorithm under the assumption of realizability: that there exists perfect predictor of expected rewards. In the worst case, this turns out to be useless, but in special cases extremely powerful.
Many experiments with parallel online learningDaniel Hsu, Nikos Karampatziakis, John Langford, and Alex Smola, Parallel Online Learning, Scaling up Machine Learning (the book), 2011Experiments and analysis of all the obvious feature-partition based online learning algorithms. Results varied between superior and catastrophic in surprising ways, with no one algorithm the clear winner.
We eat an exponential in contextual bandit learningMiroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang, Efficient Optimal Learning for Contextual Bandits, UAI 2011.We call this the "monster paper". It provides the second known class of contextual bandit algorithms, and reduces the complexity of contextual bandit learning from linear in the number of things competed with to polylog given an optimzation oracle.
Efficient use of exploration infoMiroslav Dudik, John Langford and Lihong Li, Doubly Robust Policy Evaluation and Learning, ICML 2011This approach appears generally superior to offset trees for learning and substantially approves offline evaluation in exploration settings
Unbiased offline evaluationLihong Li, Wei Chu, John Langford, Xuanhui Huang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms WSDM 2011.Evidence that offline evaluation techniques work for news recommendation
Dealing with Importance weightsNikos Karampatziakis and John Langford, Importance Weight Aware Gradient Updates, UAI 2011.A much better method than "multiply gradient by importance weight" that has substantial advantages even when the importance weight is always 1.
Efficient Active LearningAlina Beygelzimer, Daniel Hsu, John Langford, and Tong Zhang Agnostic Active Learning Without Constraints NIPS 2010.This is an IWAL-style algorithm that functions as a pure reduction to ERM-style supervised learning. It's quite effective empiricallly. Some slides.
Information-theoretically optimal learningAlina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire, An Optimal High Probability Algorithm for the Contextual Bandit Problem AIStat 2011. We improve the EXP4 algorithm to work with high probability, establishing a better guarantee than exists previously for arbitrary policy classes in the contextual bandit setting.
Implicit ExplorationAlex Strehl, John Langford, Sham Kakade, and Lihong Li Learning from Logged Implicit Exploration Data NIPS 2010.This papers discusses how to learn from implicit exploration, which may be nonrandom. It removes one of the fundamental obstacles to learning with exploration.
Contextual Bandits for NewsLihong Li, Wei Chu, John Langford, Robert Schapire, A Contextual-Bandit Approach to Personalized News Article Recommendation WWW 2010This is about a contextual bandit approach to news article recommendation.
Delay in Parallel LearningJohn Langford, Alex Smola, Martin Zinkevich. Slow Learners are Fast NIPS 2009This paper discusses how delay is inevitable in parallel online learning environments and analyzes the effect of delay in a linear setting.
Learning Nonlinear Dynamic ModelsJohn Langford, Ruslan Salakhutdinov, and Tong Zhang, Learning Nonlinear Dynamic Models, ICML 2009.This paper proposes, evaluates, and proves consistent a new approach to dealing with nonlinear dynamic models based on employing learning algorithms to form predictive states.
Sparse Online LearningJohn Langford, Lihong Li, and Tong Zhang, Sparse Online Learning via Truncated Gradient, NIPS 2008 Also see the appendix or the full JMLR 2009 draft.This paper analyzes simple methods for inducing sparsity in online learning and proposes the truncated gradient as the best approach.See also this paper which proposes a less efficient but more exact method.
Importance Weighted Active LearningAlina Beygelzimer, Sanjoy Dasgupta, and John Langford, Importance Weighted Active Learning, ICML 2009 some codeWe create active learning algorithms using importance weights with finite label complexity guarantees and good empirical performance.
Exploration Scavenging John Langford, Alexander Strehl, and Jennifer Wortman Exploration Scavenging ICML 2008 This is about how to make use of nonrandom accidental exploration in evaluating and optimizing new policies.
Contextual Bandit->ClassificationAlina Beygelzimer and John Langford The Offset Tree for Learning with Partial Labels KDD 2009. This paper reduces contextual bandit learning to binary classification. It also proves a matching lower bound on contextual bandit learning suggesting this problem is fundamentally harder.
Contextual Bandit LearningJohn Langford and Tong Zhang The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits NIPS 2007 .texAdding side information to bandits creates a new (relatively unanalyzed) setting. This paper analyzes the first practical algorithm in that setting.
Agnostic Active Learning Nina Balcan, Alina Beygelzimer, John Langford Agnostic Active Learning ICML 2006 .tex Video lecture also, JCSS Volume 75, issue 1, Pages 78-89 versionWe show that active learning (the selective sampling version) is possible in situations with adversarially-placed noice.
See also this one which makes the algorithm more algorithmic and Steve's paper characterizing when speedups are possible.
Perfect Online Learning Jacob Abernethy, John Langford, Manfred K. Warmuth Continuous Experts and the Binning Algorithm COLT 2006 .ps.gz .pdf .tex This paper presents and proves a computationally tractable perfectly optimal online learning algorithm "binning" in the familiar experts setting for classification.