On-line learning of regular decision rules

Many decision problems can be represented in the form
FOR n=1,2,…:
— Reality chooses a datum xn.
— Decision Maker chooses his decision dn.
— Reality chooses an observation yn.
— Decision Maker suffers loss L(yn,dn).
END FOR.
The observation yn can be, for example, tomorrow’s stock price and the decision dn the number of shares Decision Maker chooses to buy. The datum xn ideally contains all information that might be relevant in making this decision. We do not want to assume anything about the way Reality generates the observations and data.

Suppose there is a good and not too complex decision rule D mapping each datum x to a decision D(x). Can we perform as well, or almost as well, as D, without knowing it? This is essentially a special case of the problem of on-line learning.

This is a simple result of this kind. Suppose the data xn are taken from [0,1] and L(y,d)=|yd|. A norm ||h|| of a function h on [0,1] is defined by

||h||2 = (Integral01 h(t) dt)2 + Integral01 (h'(t))2 dt.
Decision Maker has a strategy that guarantees, for all N and all D with finite ||D||,
Sumn=1N L(yn,dn) is at most Sumn=1N L(yn,D(xn)) + (||2D–1|| + 1) N1/2.
Therefore, Decision Maker is competitive with D provided the “mean slope” (Integral01 (D'(t))2dt)1/2 of D is significantly less than N1/2. This can be extended to general reproducing kernel Hilbert spaces of decision rules and to many other loss functions.

It is interesting that the only known way of proving this result uses a non-standard (although very old) notion of probability. The standard definition (“Kolmogorov’s axiomatic“) is that probability is a normalized measure. The alternative is to define probability in terms of games rather than measure (this was started by von Mises and greatly advanced by Ville, who in 1939 replaced von Mises’s awkward gambling strategies with what he called martingales). This game-theoretic probability allows one to state the most important laws of probability as continuous strategies for gambling against the forecaster: the gambler becomes rich if the forecasts violate the law. In 2004 Akimichi Takemura noticed that for every such strategy the forecaster can play in such a way as not to lose a penny. Takemura’s procedure is simple and constructive: for any continuous law of probability we have an explicit forecasting strategy that satisfies that law perfectly. We call this procedure defensive forecasting. Takemura’s version was about binary observations, but it has been greatly extended since.

Now let us see how we can achieve the goal Sumn=1 N L(yn,dn) < Sumn=1N L(yn,D(xn)) + (something small). This would be easy if we knew the true probabilities for the observations. At each step we would choose the decision with the smallest expected loss and the law of large numbers would allow us to prove that our goal would be achieved. Alas, we do not know the true probabilities (if they exist at all). But with defensive forecasting at our disposal we do not need them: the fake probabilities produced by defensive forecasting are ideal as far as the law of large numbers is concerned. The rest of the proof is technical details.