Reinforcement Learning

Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, PAC Model-Free Reinforcement Learning ICML 2006 .tex, .ps.gz, .pdf	An MDP can be explored with only O(SA) actions. Slides from Lihong's presentation
Sham Kakade, Michael Kearns, and John Langford Exploration in Metric State Spaces ICML2003 .ps.gz, .pdf, .tex	An MDP with a metric property can be explored with an amount of experience related to a covering number.
Sham Kakade, John Langford Approximately Optimal Approximate Reinforcement Learning ICML2002 .ps.gz, .pdf, .tex	Introduces the "Conservative policy iteration" algorithm which has the advantages of policy iteration and policy gradient while losing several of the disadvantages of these algorithms.
John Langford, Martin Zinkevich, Sham Kakade Competitive Analysis of the Explore/Exploit Tradeoff ICML2002 .ps.gz, .pdf, .tex	Analysis of the explore/exploit tradeoff in a simplified model.