Direct Experience MDP e,T-optimal O(|S|2|A|T3/e3)E3
e-Approx. Planning Factoring Poly(|Factoring|, 1/e,T) Factored-E3
local model O(T |Cover| / e) Metric-E3
e-optimal O(|A||S|T2/e2) Q-learning
Trace Model einfinity-regression einfinity -unmonotone einfinity / T2,T -optimal ???Appr. Policy Iter.
monotone local optimal Very LargePolicy Gradient
&mu Trace Model e/T2-regression &mu = opt. dist e2-monotone e,T-optimal O(T/e2)Cons. Policy Iter.
Generative Model (|A|T/e)O(T) Sparse Sampling
e/T-classification |A|T RLGen
e-local-optimal O(T2)various
&mu = opt. dist Te,T-optimal PSDP
Deterministic Gen. Model e local optimal T3 log (1/e) /e2 Pegasus
Precise Description MDP optimal T|S||A| Value Iter.