Example: In a markov decision process world, you learn from (s,a,r)* where each action is chosen according to the algorithm and the state is chosen from P(s'|s,a).
Example: Robbie's Problem: