A tight analysis of exploration in deterministic MDPs | Sven Koenig and Reid Simmons, Complexity Analysis of Real-Time Reinforcement Learning, AAAI 1993 |
Exploration in MDPs with a mixing time has poly sample complexity. | Michael Kearns and Satinder Singh, Near-Optimal Reinforcement Learning in Polynomial Time, ICML 1998 |
Optimizing a reward function based on observed rewards plus uncertainty gives the same result. | Ronen Brafman and Moshe Tennenholtz R-max - A general polynomial time algorithm for near-optimal reinforcement learning IJCAI 2001 |
Exploration in Factored-MDPs is possible given the factoring and the ability to solve a computationally hard learning problem. | Michael Kearns and Daphne Koller, Efficient Reinforcement Learning in Factored MDPs, IJCAI 1999 (There is a fundamentally hard planning problem. It's really hard.) |
Removal of mixing time assumption, O(S2A) analysis, a lower bound. | Sham Kakade, On the Sample Complexity of Reinforcement Learning Thesis Gatsby, UCL, 2003 |
Exploration is possible in metric-MDPs with infinite states. | Sham Kakade, Michael Kearns, and John Langford, Exploration in Metric State Spaces, ICML 2003 |
The sample complexity can be reduced by keeping precise confidence intervals. | Alexander L. Strehl and Michael L. Littman. A Theoretical Analysis of Model-Based Interval Estimation ICML 2005 |
Poly(A,O,KH)T is possible for a POMDP where KH is the homing time and O is the number of observations. | Eyal Even-Dar, Sham Kakade, & Yishay Mansour Reinforcement Learning in POMDPs Without Resets, IJCAI 2005 |
Exploration requires only O(SA) experience --- less than the description complexity of an MDP. | Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, PAC Model-Free Reinforcement Learning ICML 2006 |