Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, PAC Model-Free Reinforcement Learning ICML 2006 .tex, .ps.gz, .pdf | An MDP can be explored with only O(SA) actions. Slides from Lihong's presentation |
Sham Kakade, Michael Kearns, and John Langford Exploration in Metric State Spaces ICML2003 .ps.gz, .pdf, .tex | An MDP with a metric property can be explored with an amount of experience related to a covering number. |
Sham Kakade, John Langford Approximately Optimal Approximate Reinforcement Learning ICML2002 .ps.gz, .pdf, .tex | Introduces the "Conservative policy iteration" algorithm which has the advantages of policy iteration and policy gradient while losing several of the disadvantages of these algorithms. |
John Langford, Martin Zinkevich, Sham Kakade Competitive Analysis of the Explore/Exploit Tradeoff ICML2002 .ps.gz, .pdf, .tex | Analysis of the explore/exploit tradeoff in a simplified model. |