RLBench, a Reinforcement Learning Benchmark Suite


Neither Drew Bagnell nor John Langford have had time to work on RLbench for a year so this project is currently 'on hold'. We strongly support the continuing efforts of others such as CLSquare and RL-Glue to set up a benchmark suite for RL. For reference purposes (and in case this project restarts), we will keep this web page 'live'.
The goal of the RLBench project is to make testing of Reinforcement Learning algorithms both simple and uniform, so that when new algorithms are created, they can easily be tested on a wide battery of problems.
Anyone interested in contributing problems to RLBench should email the maintainers. Our policy is to accept any problem which fits the interface (discussed below).

Simulators

The current benchmark suite includes:
Tetris (from Sham Kakade and Drew Bagnell) (This one is buggy as of yet.)
Bicycle (from John Langford)
Maze world (from Drew Bagnell)
Pole balancing (from John Langford)

Downloads

Source code (version 3)
(Please cite RLBench if you use the suite.)

Interface Description

It is difficult to apply Reinforcement learning to a fixed dataset as is done for classification benchmarks because in Reinforcement Learning, the set of examples seen is fundamentally dependent on the action taken. To cope with this fundamental source of complexity, each problem in the benchmark is encoded as program/simulator with a standard interface.

Each of these outputs a description of the action space on startup.

Deterministic Generative Model

The simulator sends:
observation
reward
state

The simulator recieves:
state (on a line by itself)
random seed
action

Generative Model

Reinforcement learning algorithms can work directly with this deterministic generative model, or with interfaces providing less information (and thus making the problem harder). The next lower level of information is a generative model. A generative model simulator sends:
observation
reward
state

and then expects:

state
action

Trace Model

And the next lower amount of information is a trace model which repeatedly sends:
observation
reward
and then accepts:

action