RL->regression given an oracle to the optimal state distribution. Randomized. | Sham Kakade, John Langford Approximately Optimal Approximate Reinforcement Learning ICML 2002. Also in: Sham Kakade On the Sample Complexity of Reinforcement Learning Thesis, Gatsby, UCL 2003 |
RL->binary classification given an oracle to the optimal state distribution. Derandomized | Drew Bagnell, Sham Kakade, Andrew Ng, & Jeff Schneider, Policy Search by Dynamic Programming, NIPS 2003 |
RL->binary classification given a generative model. e->Te | John Langford and Bianca Zadrozny Reducing T-step Reinforcement Learing to Classification 2003 (unpublished) |
RL->binary classification given a reset. | Bianca Zadrozny Policy mining: Learning decision policies from fixed sets of data, Ph.D. Thesis, University of California, San Diego, 2003 |
RL->binary classifiation given oracle acces to optimal policy at training time. | John Langford and Bianca Zadrozny Relating Reinforcement Learning Performance to Classification Performance ICML 2005 |
RL->binary classification given oracle access optimal policy (nice application) | Doron Blatt and Alfred Hero, From weighted classification to policy search, NIPS 2005 |
RL->binary classification given oracle access to optimal policy at training time. Improves time complexity of the previous. Randomized. | Hal Daumé III, John Langford, and Daniel Marcu Search-Based Structured Prediction (unpublished) |
| |