Robbie, when he knows the world

Deterministic Gen. Model e local optimal O(T3 /e2) Pegasus
Precise Description MDP T-optimal O(T|S||A|) Value Iter.
optimal O(|A||S|) Policy Iter.
basic setting direct experience reset model generative model Full table