Robbie with a Reset

reset Model einfinity-regression einfinity -unmonotone einfinity / T2,T -optimal Very Large Appr. Policy Iter.
monotone local optimal Very Large Policy Gradient
&mu Reset Model e/T2-regression &mu = opt. dist e2-monotone e,T-optimal O(T/e2) Cons. Policy Iter.
basic setting direct experience generative model precise description full table