| Direct Experience |
MDP |
|
|
e,T-optimal |
O(|S|2|A|T3/e3) | E3 |
| e-Approx. Planning |
Factoring |
Poly(|Factoring|, 1/e,T) | Factored-E3 |
| local model |
O(T |Cover| / e) | Metric-E3 |
|
|
e -optimal |
O(|A||S|T2/e2) |
Q-learning |
| Trace Model
| einfinity regression |
|
e -unmonotone |
einfinity / T2,T optimal |
??? | Appr. Policy Iter. |
|
|
monotone |
local optimal |
Very Large | Policy Gradient |
| &mu Trace Model
| e/T2 regression |
&mu = opt. dist |
e2-monotone |
e,T -optimal |
O(T/e2) | Cons. Policy Iter. |
| Generative Model |
|
|
|
(|A|T/e)O(T) | Sparse Sampling |
| e/T classification |
|
|
|A|T | RLGen |
|
|
e local-optimal |
O(T2) | various |
| &mu = opt. dist |
|
Te,T -optimal |
PSDP |
| Deterministic Gen. Model |
|
|
|
e local optima |
T3 log (1/e) /e2 |
Pegasus |
| Precise Description |
MDP |
|
|
optimal |
T|S||A| |
Value Iter. |