| Information-theoretically optimal learning | Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire, An Optimal High Probability Algorithm for the Contextual Bandit Problem 2010. | We improve the EXP4 algorithm to work with high probability, establishing a better guarantee than exists previously for arbitrary policy classes in the contextual bandit setting. |
| Implicit Exploration | Alex Strehl, John Langford, and Sham Kakade Learning from Logged Implicit Exploration Data 2010. | This papers discusses how to learn from implicit exploration, which may be nonrandom. It removes one of the fundamental obstacles to learning with exploration. |
| Learning Nonlinear Dynamic Models | John Langford, Ruslan Salakhutdinov, and Tong Zhang, Learning Nonlinear Dynamic Models, ICML 2009. | This paper proposes, evaluates, and proves consistent a new approach to dealing with nonlinear dynamic models based on employing learning algorithms to form predictive states. | |
| Sparse Online Learning | John Langford, Lihong Li, and Tong Zhang, Sparse Online Learning via Truncated Gradient, NIPS 2008 Also see the appendix or the full JMLR 2009 draft. | This paper analyzes simple methods for inducing sparsity in online learning and proposes the truncated gradient as the best approach. | See also this paper which proposes a less efficient but more exact method. |
| Importance Weighted Active Learning | Alina Beygelzimer, Sanjoy Dasgupta, and John Langford, Importance Weighted Active Learning, ICML 2009 some code | We create active learning algorithms using importance weights with finite label complexity guarantees and good empirical performance. |
| Exploration Scavenging | John Langford, Alexander Strehl, and Jennifer Wortman Exploration Scavenging ICML 2008 | This is about how to make use of nonrandom accidental exploration in evaluating and optimizing new policies. |
| Contextual Bandit->Classification | Alina Beygelzimer and John Langford The Offset Tree for Learning with Partial Labels KDD 2009. | This paper reduces contextual bandit learning to binary classification. It also proves a matching lower bound on contextual bandit learning suggesting this problem is fundamentally harder. |
| Contextual Bandit Learning | John Langford and Tong Zhang The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits NIPS 2007 .tex | Adding side information to bandits creates a new (relatively unanalyzed) setting. This paper analyzes the first practical algorithm in that setting. |
| Agnostic Active Learning | Nina Balcan, Alina Beygelzimer, John Langford Agnostic Active Learning ICML 2006 .tex Video lecture also, JCSS Volume 75, issue 1, Pages 78-89 version | We show that active learning (the selective sampling version) is possible in situations with adversarially-placed noice.
| See also this one which makes the algorithm more algorithmic and Steve's paper characterizing when speedups are possible. |
| Perfect Online Learning |
Jacob Abernethy, John Langford, Manfred K. Warmuth Continuous Experts and the Binning Algorithm COLT 2006 .ps.gz .pdf .tex |
This paper presents and proves a computationally tractable perfectly optimal online learning algorithm "binning" in the familiar experts setting for classification. |