RLBench, a Reinforcement Learning Benchmark Suite
Neither Drew Bagnell nor John Langford have had time to work on
RLbench for a year so this project is currently 'on hold'. We
strongly support the continuing efforts of others such as CLSquare and RL-Glue to set up
a benchmark suite for RL.
For reference purposes (and in case this project restarts), we will
keep this web page 'live'.
Anyone interested in contributing problems to RLBench should email the
Our policy is to accept any problem which fits the interface
of the RLBench project is to make testing of Reinforcement Learning
algorithms both simple and uniform, so that when new algorithms are
created, they can easily be tested on a wide battery of problems.
The current benchmark suite includes:
Tetris (from Sham Kakade and Drew Bagnell) (This one is buggy as of yet.)
Bicycle (from John Langford)
Maze world (from Drew Bagnell)
Pole balancing (from John Langford)
Source code (version 3)
(Please cite RLBench if you use the suite.)
It is difficult to apply Reinforcement learning to a fixed dataset as
is done for classification
benchmarks because in Reinforcement Learning, the set of examples
seen is fundamentally dependent on the action taken. To cope with
this fundamental source of complexity, each problem in the benchmark
is encoded as program/simulator with a standard interface.
Each of these outputs a description of the action space on startup.
Deterministic Generative Model
The simulator sends:
The simulator recieves:
state (on a line by itself)
Reinforcement learning algorithms can work directly with this
deterministic generative model, or with interfaces providing less
information (and thus making the problem harder). The next lower
level of information is a generative model. A generative model simulator
and then expects:
And the next lower amount of information is a trace model which repeatedly
and then accepts: