Robbie's real problem
Direct Experience
MDP
e,T-optimal
O(|S|
2
|A|T
3
/e
3
)
E
3
or
R
max
(improved)
e-Approx. Planning
Factoring
Poly(|Factoring|, 1/e,T)
Factored-E
3
local model
O(T |Cover| / e)
Metric-E
3
e-optimal
O(|A|
|S|
T
2
/e
2
)
Q-learning
basic setting
reset model
generative model
precise description
full table