| John Langford and Tong Zhang The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits NIPS 2007 .tex | Adding side information to bandits creates a new (relatively unanalyzed) setting. This paper analyzes the first practical algorithm in that setting. |
| Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, PAC Model-Free Reinforcement Learning ICML 2006 .tex, .ps.gz, .pdf | An MDP can be explored with only O(SA) actions. Slides from Lihong's presentation |
| Sham Kakade, Michael Kearns, and John Langford Exploration in Metric State Spaces ICML2003 .ps.gz, .pdf, .tex | An MDP with a metric property can be explored with an amount of experience related to a covering number. |
| Sham Kakade, John Langford Approximately Optimal Approximate Reinforcement Learning ICML2002 .ps.gz, .pdf, .tex | Introduces the "Conservative policy iteration" algorithm which has the advantages of policy iteration and policy gradient while losing several of the disadvantages of these algorithms. |
| John Langford, Martin Zinkevich, Sham Kakade Competitive Analysis of the Explore/Exploit Tradeoff ICML2002 .ps.gz, .pdf, .tex | Analysis of the explore/exploit tradeoff in a simplified model. |
See the