In reinforcement learning (and sometimes other settings), there is a notion of “state”. Based upon the state various predictions are made such as “Which action should be taken next?” or “How much cumulative reward do I expect if I take some action from this state?” Given the importance of state, it is important to examine the meaning. There are actually several distinct options and it turns out the definition variation is very important in motivating different pieces of work.

- Newtonian State. State is the physical pose of the world. Under this definition, there are
*very*many states, often too many for explicit representation. This is also the definition typically used in games. - Abstracted State. State is an abstracted physical state of the world. “Is the door open or closed?” “Are you in room A or not?” The number of states is much smaller here. A basic issue here is: “How do you compute the state from observations?”
- Mathematical State. State is a sufficient statistic of observations for making necessary predictions.
- Internal State. State is the internal belief/understanding/etc… which changes an agent’s actions in different circumstances. A natural question is: “How do you learn a state?” This is like the mathematical version of state, except that portions of the statistic which can not be learned are irrelevant.
- There are no states. There are only observations (one of which might be a reward) and actions. This is more reasonable than it might sound because state is a derived quantity and the necessity of that derivation is unclear. PSRs are an example.

The different questions these notions of state motivate can have large practical consequences on the algorithms used. It is not clear yet what the “right” notion of state is—we just don’t know what works well. I am most interested in “internal state” at the moment.

Things may get nasty when state is described by continuous parameters because whatever you do must be a continuous function of state (this is a general theorem: all computable functions are continuous). So if you’re going to make binary decisions, or discrete decisions of any sort, with continuous parameters as inputs, there will be states for which your algorithm fails (diverges, gives numerically senseless answers), or there will be states where your algorithm gives non-deterministic answers (depending say, on the history of the execution of your algorithm). This may not be a big problem for learning, but it’s good to keep such things in mind.

Actually, the typical example of this kind shows up when you try to classify a point by looking at which side of a hyperplane it lies on. The computation comes down to determining the sign (a discrete answer) of a certain scalar product. This is known to be non-computable in general, or at least you should expect numerical difficulties with points that lie near the hyperplane. In

practice, do such things cause problems with learning algorithms? How often does a learning algorithm which should work in theory exhibit bad behavior in practice?For the first question: Yes, numerical problems arise, but no they can be handled by being appropriately careful. The essential observation is that most samples are not near the boundary.

For the second question: good theory/bad practice is very common. Good learning theory which can actually predict what will happen in practice is rare.

One way to look at state representations is as a communications protocol, possibly to others, and possibly as a note to yourself for later. In this viewpoint the “there is no state” model is just the statement “All of the notes I leave to myself will be in the form they were presented to me.”

From this viewpoint, abstract state is two ideas thrown together. First, abstract state is the shorthand you build to make learning timely. Secondly, one may very well receive information communicated in an abstract form, which means that you will have observations in abstract form.

The virtue of this is perspective is that it keeps open all of the theoretic machinery to analyze communication.

In 1984, Leslie Lamport wrote a wonderful piece on the arbiter’s problem of making a binary decision from continuous inputs . More recently, the practical consequences of the problem for asynchronous circuits have received some attention.