The Reset Model

In the reset model, information comes from a set of independent traces.
Example 1: In a markov decision process world, you learn (several) (s,a,r)^T where the initial state is chosen from P(s) and each action is chosen according to the algorithm and the state is chosen from P(s'|s,a).
Example 2: Robbie can learn from his 10000 clones.

The μ Reset Model

The reset model except you choose P(s).