Direct Experience MDP e,T-optimal O(|S|2|A|T3/e3)E3
e-Approx. Planning Factoring Poly(|Factoring|, 1/e,T) Factored-E3
local model O(T |Cover| / e) Metric-E3
e -optimal O(|A||S|T2/e2) Q-learning
Trace Model einfinity regression e -unmonotone einfinity / T2,T optimal ???Appr. Policy Iter.
monotone local optimal Very LargePolicy Gradient
&mu Trace Model e/T2 regression &mu = opt. dist e2-monotone e,T -optimal O(T/e2)Cons. Policy Iter.
Generative Model (|A|T/e)O(T) Sparse Sampling
e/T classification |A|T RLGen
e local-optimal O(T2)various
&mu = opt. dist Te,T -optimal PSDP
Deterministic Gen. Model e local optima T3 log (1/e) /e2 Pegasus
Precise Description MDP optimal T|S||A| Value Iter.