Direct Experience | MDP | e,T-optimal | O(|S|2|A|T3/e3) | E3 or Rmax (improved) | ||
e-Approx. Planning | Factoring | Poly(|Factoring|, 1/e,T) | Factored-E3 | |||
local model | O(T |Cover| / e) | Metric-E3 | ||||
e-optimal | O(|A||S|T2/e2) | Q-learning | ||||
Reset Model | einfinity-regression | einfinity -unmonotone | einfinity / T2,T -optimal | Very Large | Appr. Policy Iter. | |
monotone | local optimal | Very Large | Policy Gradient | |||
&mu Reset Model | e/T2-regression | &mu = opt. dist | e2-monotone | e,T-optimal | O(T/e2) | Cons. Policy Iter. |
Generative Model | (|A|T/e)O(T) | Sparse Sampling | ||||
e/T-classification | O(|A|T) | RLGen | ||||
e-local-optimal | O(T2) | various | ||||
&mu = opt. dist | Te,T-optimal | PSDP | ||||
Deterministic Gen. Model | e local optimal | O(T3 /e2) | Pegasus | |||
Precise Description | MDP | optimal | O(T|S||A|) | Value Iter. | ||
O(|A||S|) | Policy Iter. |