Robbie, when he knows the world
Deterministic Gen. Model
e local optimal
O(T
3
/e
2
)
Pegasus
Precise Description
MDP
T-optimal
O(T|S||A|)
Value Iter.
optimal
O(|A|
|S|
)
Policy Iter.
basic setting
direct experience
reset model
generative model
Full table