Online Learning as the Mathematics of Accountability

Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made.

Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model.

In the online learning model, there are a set of hypotheses or “experts”. On any instantance x, each expert makes a prediction y. A master algorithm A uses these predictions to form it’s own prediction y_A and then learns the correct prediction y^*. This process repeats.

The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove:

L(A) less than min_e L(e) + log(number of experts)
over all sequences.

In particular, there is no assumption of independent samples and there is no assumption that the experts perform well (or can perform well). This is not a high probability statement: it simply always holds. These assumption-free qualities are very important for application to the accountability problem, because the experts really can be adversarial.

In any situation where we have a set of human experts giving advice on the same subject, we can hope to apply online learning algorithms to better distill collective advice into single prediction. Examples include:

stock picking Many of the big stocks have ‘analysts’ who predict whether they will go up or down. For an example, look here.
surveys It is common to see things like “A gain of 1.4 percent was expected, based on the median estimate in a Bloomberg survey of 53 economists” in news articles. Presumably, these economists are reused frequently implying they have a record to which an online algorithm could be applied.

This application of online learning isn’t trivial. Even for the above examples, it isn’t clear how to handle issues like:

A new expert starts making predictions. There are some reasonable ad-hoc mechanisms for coping with this in the context of particular algorithms.
An expert declines to make a prediction. The modified “sleeping experts” setting handles this, but the results are not quite as tight.
The loss associated with individual predictions is highly variable rather than something simple like “0/1-loss” or “squared error loss”. One approach to this is to combine regret minimizing learning reductions with online learning algorithms (drawback: the reduced predictions may not be of intuitive things). Another approach is simply trying to make very flexible master algorithms (drawback: flexibility often comes with a weakening in the theoretical guarantees).
In the real world, we may not have feedback about a prediction until after the next 10 predictions (or more) need to be made.
In the real world, there may be uncertainty about the measured truth. Quantifying GDP growth requires a lot of work and has some fundamental uncertainty associated with it, especially when immediate feedback is required.

4 Replies to “Online Learning as the Mathematics of Accountability”

One approach to this is to combine regret minimizing learning reductions with online learning algorithms (drawback: the reduced predictions may not be of intuitive things).

Could you elaborate on this point?

Sure.

Let’s consider simplified stock picking. Suppose there are 4 stocks to choose from named A, B, C, and D. The problem of choosing a “good” stock is (to a good approximation) a cost sensitive classification problem. To see this, note that for every stock we have a buy price and a sell price. The difference between these two is the cost associated with picking a particular stock (where negative cost = a positive return on the investment). We can freely normalize this problem by subtracting the smallest cost from all costs resulting in a cost sensitive classification problem.

Looking this up we see that the SECOC reduction with a Hadamard code is a regret minimizing reduction to binary classification. The form of the binary problems is prototypically: “Is the expected sum of returns of stocks A, B, and C greater than 0.7 * the expected sum of returns of stocks A,B,C, and D?”

This prediction problem is not what stock pickers have spent years building up an understanding of, so requiring them to make predictions of this sort would be very intrusive (and possibly simply more difficult) than using more natural predictions like “buy stock B”.

Pingback: Machine Learning (Theory) » On-line learning of regular decision rules

Pingback: Amir Muhammadzadeh - What’s the difference between gambling and rewarding good prediction?

Comments are closed.