Direct Experience MDP e,T-optimal O(|S|2|A|T3/e3) E3 or Rmax (improved)
e-Approx. Planning Factoring Poly(|Factoring|, 1/e,T) Factored-E3
local model O(T |Cover| / e) Metric-E3
e-optimal O(|A||S|T2/e2) Q-learning
Reset Model einfinity-regression einfinity -unmonotone einfinity / T2,T -optimal Very Large Appr. Policy Iter.
monotone local optimal Very Large Policy Gradient
&mu Reset Model e/T2-regression &mu = opt. dist e2-monotone e,T-optimal O(T/e2) Cons. Policy Iter.
Generative Model (|A|T/e)O(T) Sparse Sampling
e/T-classification O(|A|T) RLGen
e-local-optimal O(T2) various
&mu = opt. dist Te,T-optimal PSDP
Deterministic Gen. Model e local optimal O(T3 /e2) Pegasus
Precise Description MDP optimal O(T|S||A|) Value Iter.
O(|A||S|) Policy Iter.
basic setting direct experience reset model generative model precise description