| Direct Experience | MDP | e,T-optimal | O(|S|2|A|T3/e3) | E3 or Rmax (improved) | ||
| e-Approx. Planning | Factoring | Poly(|Factoring|, 1/e,T) | Factored-E3 | |||
| local model | O(T |Cover| / e) | Metric-E3 | ||||
| e-optimal | O(|A||S|T2/e2) | Q-learning | ||||
| Reset Model | einfinity-regression | einfinity -unmonotone | einfinity / T2,T -optimal | Very Large | Appr. Policy Iter. | |
| monotone | local optimal | Very Large | Policy Gradient | |||
| &mu Reset Model | e/T2-regression | &mu = opt. dist | e2-monotone | e,T-optimal | O(T/e2) | Cons. Policy Iter. |
| Generative Model | (|A|T/e)O(T) | Sparse Sampling | ||||
| e/T-classification | O(|A|T) | RLGen | ||||
| e-local-optimal | O(T2) | various | ||||
| &mu = opt. dist | Te,T-optimal | PSDP | ||||
| Deterministic Gen. Model | e local optimal | O(T3 /e2) | Pegasus | |||
| Precise Description | MDP | optimal | O(T|S||A|) | Value Iter. | ||
| O(|A||S|) | Policy Iter. |