Robbie's real problem

Direct Experience MDP e,T-optimal O(|S|2|A|T3/e3) E3 or Rmax (improved)
e-Approx. Planning Factoring Poly(|Factoring|, 1/e,T) Factored-E3
local model O(T |Cover| / e) Metric-E3
e-optimal O(|A||S|T2/e2) Q-learning
basic setting reset model generative model precise description full table