# Machine Learning (Theory)

## 5/2/2010

### What’s the difference between gambling and rewarding good prediction?

Tags: Machine Learning jl@ 11:07 pm

After a major financial crisis, there is much discussion about how finance has become a casino gambling with other’s money, keeping the winnings, and walking away when the money is lost.

When thinking about financial reform, all the many losers in the above scenario are apt to take the view that this activity should be completely, or nearly completely curtailed. But, a more thoughtful view is that sometimes there is a real sense in which there are right and wrong decisions, and we as a society would really prefer that the people most likely to make right decisions are making them. A crucial question then is: “What is the difference between gambling and rewarding good prediction?”

We discussed this before the financial crisis. The cheat-sheet sketch is that the online learning against an adversary problem, algorithm, and theorems, provide a good mathematical model for thinking about this question. What I would like to do here is map this onto various types of financial transactions. The basic mapping is between “wealth” and “weight”, with the essential idea that you can think of wealth as either money or degree of control over decision making. The core algorithms start with a “wealth” spread over many experts, each of which makes predictions and then has it’s wealth updated according to a soft exponential of the value of it’s prediction.

1. Going Long. The basic strategy here is to buy low and sell high. This strategy is not inherently sound from a learning theory point of view, because a single purchased item can sometimes drop to zero value. Similarly, a single purchased item can sometimes grow radically in value. Neither of these properties are desirable from the viewpoint of a learning algorithm. In the zero value case, a good decision maker can be wiped out by one decision, while in the large value case, a lucky decision maker can randomly achieve overwhelming credit. Nevertheless, there is a sense in which this strategy is compatible. If each item purchased either doubles or halves in value, the fluctuation in the wealth of a decision maker is analogous to the fluctuation in the relative weight of on an expert in the online learning framework.
2. … with diversification. Going long with diversification implies purchasing several items and selling them later. Adding diversification to the “Long” strategy helps it align substantially better with an optimal learning theory strategy. Single points of failure are avoided, while random fluctuations up in wealth are reduced.
3. Going Short. The short strategy is borrowing an item (typically a stock), selling it high, then buying it back low to cover the debt. It’s technique used to make money when a stock decreases in value. This technique was banned for a time during the crisis. From the perspective of learning theory, short selling is more dangerous than long, because it’s possible to end up with negative wealth when a stock is sold short, and then it increases in value. To avoid this, it’s necessary to have sufficient collateral to cover the short at all times. If this collateral is at least twice the value when shorting occurs, it’s hard for participants to become wealthy by luck, because wealth at most doubles. Diversification is also a potentially useful helper strategy.
4. Insurance. Credit Default Swaps are effectively a form of insurance where one party pays another small amounts unless something bad happens, in which case large amounts of money go the other direction. In the financial crisis, credit default swaps made the crisis viral, as the “pay up” clauses triggered, particularly wiping out AIG. Insurance has the same general problem as short selling—it can result in negative wealth unless there is sufficient collateral. It also has the same solution.
5. Clawback. The basic idea of a “clawback” is that when someone fouls up really badly, you extract it from their past paychecks. As far as I can tell, this sort of clause exists in nearly no contracts, but it’s a popular proposal in retrospect, particularly for certain AIG employees who destroyed their company. The driving problem here is that the actual value of a decision is not known for some time, and it’s misestimated in the short term. Learning theory suggests that you should apply updates to estimated value as soon as possible to adjust wealth, which would correspond to a potential 100% clawback clause.

Two things strike me in considering the above.

The first is that for normal people interacting with the financial system a set of financial rules + good sense have developed such that wealth tends to grow and shrink in a manner similar to what learning theory would suggest is near optimal. For example, most people use the going long strategy by default and most diversify. Most don’t use the short strategy, but those that do must have sufficient collateral. Normal people don’t have access to credit default swaps, and normal insurance has real collateral requirements. Clawbacks are automatic, as normal people bet with their own money and take their own losses.

The second is that larger actors have become quite skillful at avoiding the rules, with unsecured credit default swaps, unsecured shorts, and no clawback rules. But, learning theory is math, so it can’t really be avoided—instead what happens is inefficient decision making via inefficient learning algorithms on a societal scale.

My belief is effective financial reform will impose limits on agents just as learning theory implies. This is also the answer to the title question—it’s gambling if the corresponding learning algorithm has high regret, and it’s rewarding good prediction if the corresponding learning algorithm has low regret. Since this is already done effectively for normal people, shifting all agents towards the limits imposed in that direction works. This means lower bounds on collateral (or equivalently upper bounds on leverage), and standardized markets where all agents can interact on an equal basis. Adding in automatic clawback provisions for all performance-based pay would also probably be very effective.

A full dose of this medicine may upset many people directly affected by such legislation, as it limits their actions and imposes downsides. But this needn’t be so, because the math is straightforward, very robust, and designed precisely to pick out the good decision makers giving them wealth as rapidly as responsibly possible to make and control bigger decisions. If you are a good decision maker, then you should want this.

On the research front, there are substantial improvements we could hope for. Some basic questions are: How can we better structure marketplaces to allocate wealth according to the dynamics of an online learning algorithm? And what are the holes in the mapping between online learning and markets that need repair? And how do you repair them? And how do the repairs effect learning algorithms when backported? Good answer to this question could be radically valuable. Yiling and Jenn have a paper mapping out connections between prediction markets and online learning this year at EC, which is of interest for this direction of research.

###### 8 Comments to “What’s the difference between gambling and rewarding good prediction?”
1. jld says:

Gambling?
Huh! No!
The most critical part of it was pure swindling.

2. Mike says:

You may have implied this, but diversification isn’t just eliminating a single point of failure. The goal of diversification is to pick investments where the systematic risk is uncorrelated or even negatively correlated. I’m not sure how to put that into machine learning terms.

3. a statistical trader says:

“From the perspective of learning theory, short selling is more dangerous than long, because it’s possible to end up with negative wealth when a stock is sold short, and then it increases in value.”

This isn’t true for stocks or any exchanged traded instrument, as you seem to understand. It looks like you’ve just used loose language here.

Also, just to familiarize you with the standard language, buying CDSs is generally seen as the short side, as it expresses a negative view on the underlying credit. It’s good to clarify that you mean shorting\selling the actual CDSs, as AIG did.

Moreover, the fact that upside gains are limited is generally not important in practice due to the vanishingly small probability that returns over any reasonable holding period will be of a magnitude large enough to approach the limit. In nearly all cases, the magnitude of both the expected return that drove the selection of a given bet or that bet’s realized return fall into a range where the asymmetry you mention is not significant.

Of course, the asymmetry would come into play if one wanted to derive some sort of a worst-case bound, which I assume you want to do here to some extent. However, as you’re aware, such bounds are often so loose as to have little practical value, and I believe that would be the case here in almost all instances of short selling, whether the objective be making money or policy. Something like the AIG case where exposure was so large is of course an exception, but this case is in no way representative of short-selling in general.

Regarding the clawback question, I agree in principle that this is a good idea, but it seems very difficult to get right in a large company. The standard technique is to give a larger percentage of compensation in restricted equity, which can’t be sold for years after you’ve left the company. But why should one’s comp be tied to the actions of thousands of other people over which one has no control? Furthermore, say you are a trader who got a big bonus because your research led the bank to buy mortgages which appreciated. A few months later, you may no longer have a positive view on these securities and want to liquidate the position; however, others who want to keep it on overrule you. If you leave at this time, why should your share of the profits derived from the initial appreciation be clawed back if the securities subsequently tank? I believe this sort of situation is quite common.

Finally, as expected, the places with the best trading records are generally proprietary trading firms (no client money) that can better control these sorts of issues. I think it is more that good traders seek these firms out than that the better incentives improve trading, but the latter effect is significant too.

• Jonathan says:

Regarding clawback, tricky issue indeed. I would not want to rely on the next trader coming in to mismanage my portfolio after I leave the firm and then be held accountable for the performance of my portfolio (fortunately I don’t run ongoing positions as do algo stat/arb).

That said, the free-option of a trader taking a % of positive profits as bonus in good years and flooring at 0 in bonus if he/she blows up is a biased system, since the trader does not take on the losses. Beyond the free option year-to-year are the liabilities of the securities written or purchased by the trader that are on the books beyond his tenure at the firm.

At every firm I have been, I have seen traders writing toxic long-dated stuff that may have hedging or unwind issues. Premiums are taken, bonuses paid, and the firm is left with the liability. That is not to say that all long dated OTC instruments are problematic, but many are.

The true cost of hedging and liquidation-risk needs to be taken up front with the premiums on new positions. For better or worse this will make many currently apparently “good” trades no longer palatable. I don’t believe the true cost of hedging or liquidation-risk is well understood.

I don’t have much hope that the up-front costs will be ever understood by the right people. The controls at banks are not generally where the smartest people are. Probably the best thing we can do is force banks to de-lever. 40x leverage is a killer. That it lasted so long was a perception game. If lenders no longer trust a bank or banks, a small % loss x 40 or 80 translates to negative valuation …

4. Paul says:

The missing clawback issue proved to be catastrophic in the most recent bubble when temporary profits (basically those insurance premiums) were removed from the system in the form of massive (and instantaneous) bonuses. One simple solution to this is to replace bonuses with financial rewards that vest over time. In other words, the bonus you receive this year will not be available to you unless the company is still viable and healthy in 5 or 10 years.

Regulating the financial industry to replace cash bonuses with stock that vests over time would probably prevent this type of bubble from occurring in the future. On the other hand I don’t think I want to know how enterprising individuals would eventually find a way to abuse that restriction.

5. […] Get more information here. […]

6. Anonymous says:

What is the difference between gambling and rewarding good prediction? Lobbying.

7. Anonymous says:

Clawbacks sound nice, but if you steal lots of money, you can hire lawyers and lobbyists to keep your money. How do you tell who is gambling, and who is actually smart? Also, problems are covered up. For example, suppose you invest in an oil well, and it looks real profitable, but occasionally there is a super-costly environmental disaster. How will this be factored into the performance.

Sorry, the comment form is closed at this time.

Powered by WordPress