The Reset Model
In the reset model, information comes from a set of independent traces.
Example 1: In a markov decision process world, you learn (several) (s,a,r)T where the initial state
is chosen from P(s) and each action is
chosen according to the algorithm and the state is chosen from P(s'|s,a).
Example 2: Robbie can learn from his 10000 clones.
The μ Reset Model
The reset model except you choose P(s).