Reinforcement Learning Theory Tutorial at ICML 2006
Directions: Tutorial begins at 2:30pm Sunday in Doherty Hall 2210. Doherty is between Wean (the big ugly concrete building) and University Center.
First half tutorial slides by Satinder Singh
Second half tutorial slides by John Langford
A history of the analysis of sample complexity and a history of RL reductions analysis.
This tutorial will focus on the theory of reinforcement
learning (RL). There are several reasons why this
tutorial should happen (and, in particular, happen now).
We hope to see participants from both within the RL community and the
wider Machine Learning community. This is your opportunity to gain
some deep insight into the understanding of the Reinforcement Learning
- Importance: The reinforcement learning problem is sufficiently general to capture many (perhaps even "all") learning problems. Consequently, RL theory is widely applicable across a broad spectrum of learning problems. This aspect of RL has not previously been widely used or understood---expect to be suprised.
- History: There is no one place in which this theory has been previously collected and communicated. There are a large number of theoretical results available in RL and these are not widely understood (not even in relation to each other).
- Timing: There are new families of RL theory (detailed below) which are not widely understood even in the RL community. These new families can provide broadly useful algorithms, even for problems which have not traditionally been thought of as RL.
The tutorial will include both older and newer Reinforcement learning
theory. The older RL will include:
The newer RL theory will include:
- The TD-learning algorithm and it's analysis.
- The Q-learning algorithm and it's analysis.
For each of these forms of theory, we will present the basic results
and cover the weaknesses and strengths of the approach in context. We
expect to spend 1/4 of the time on each subject.
- Sample based analysis of RL including E3 and sparse sampling.
- Generalization based analysis of RL including conservative policy iteration and RL-to-Classification reductions.