John Langford – Page 64 – Machine Learning (Theory)

7/9/2006

The Stock Prediction Machine Learning Problem

…is discussed in this nytimes article. I generally expect such approaches to become more common since computers are getting faster, machine learning is getting better, and data is becoming more plentiful. This is another example where machine learning technology may have a huge economic impact. Some side notes:

We-in-research know almost nothing about how these things are done (because it is typically a corporate secret).
… but the limited discussion in the article seem naive from a machine learning viewpoint.
1. The learning process used apparently often fails to take into account transaction costs.
2. What little of the approaches is discussed appears modeling based. It seems plausible that more direct prediction methods can yield an edge.
One difficulty with stock picking as a research topic is that it is inherently a zero sum game (for every winner, there is a loser). Much of the rest of research is positive sum (basically, everyone wins).

7/6/2006

Branch Prediction Competition

Alan Fern points out the second branch prediction challenge (due September 29) which is a follow up to the first branch prediction competition. Branch prediction is one of the fundamental learning problems of the computer age: without it our computers might run an order of magnitude slower. This is a tough problem since there are sharp constraints on time and space complexity in an online environment. For machine learning, the “idealistic track” may fit well. Essentially, they remove these constraints to gain a weak upper bound on what might be done.

6/30/20066/30/2006

ICML papers

Here are some ICML papers which interested me.

Arindam Banerjee had a paper which notes that PAC-Bayes bounds, a core theorem in online learning, and the optimality of Bayesian learning statements share a core inequality in their proof.
Pieter Abbeel, Morgan Quigley and Andrew Y. Ng have a paper discussing RL techniques for learning given a bad (but not too bad) model of the world.
Nina Balcan and Avrim Blum have a paper which discusses how to learn given a similarity function rather than a kernel. A similarity function requires less structure than a kernel, implying that a learning algorithm using a similarity function might be applied in situations where no effective kernel is evident.
Nathan Ratliff, Drew Bagnell, and Marty Zinkevich have a paper describing an algorithm which attempts to fuse A^* path planning with learning of transition costs based on human demonstration.

Papers (2), (3), and (4), all seem like an initial pass at solving interesting problems which push the domain in which learning is applicable.

I’d like to encourage discussion of what papers interested you and why. Maybe we’ll all learn a little bit, and it’s very likely that we all missed interesting papers in a multitrack conference.

6/25/20066/25/2006

Presentation of Proofs is Hard.

When presenting part of the Reinforcement Learning theory tutorial at ICML 2006, I was forcibly reminded of this.

There are several difficulties.

When creating the presentation, the correct level of detail is tricky. With too much detail, the proof takes too much time and people may be lost to boredom. With too little detail, the steps of the proof involve too-great a jump. This is very difficult to judge.
1. What may be an easy step in the careful thought of a quiet room is not so easy when you are occupied by the process of presentation.
2. What may be easy after having gone over this (and other) proofs is not so easy to follow in the first pass by a viewer.
These problems seem only correctable by process of repeated test-and-revise.
When presenting the proof, simply speaking with sufficient precision is substantially harder than in normal conversation (where precision is not so critical). Practice can help here.
When presenting the proof, going at the right pace for understanding is difficult. When we use a blackboard/whiteboard, a natural reasonable pace is imposed by the process of writing. Unfortunately, writing doesn’t scale well to large audiences for vision reasons, losing this natural pacing mechanism.
It is difficult to entertain with a proofÃ¢â‚¬â€there is nothing particularly funny about it. This particularly matters for a large audience which tends to naturally develop an expectation of being entertained.

Given all these difficulties, it is very tempting to avoid presenting proofs. Avoiding the proof in any serious detail is fairly reasonable in a conference presentation—the time is too short and the people viewing are too heavily overloaded to follow the logic well. The “right” level of detail is often the theorem statement.

Nevertheless, avoidance is not always possible because the proof is one of the more powerful mechanisms we have for doing research.

6/24/20066/24/2006

Online convex optimization at COLT

At ICML 2003, Marty Zinkevich proposed the online convex optimization setting and showed that a particular gradient descent algorithm has regret O(T^0.5) with respect to the best predictor where T is the number of rounds. This seems to be a nice model for online learning, and there has been some significant follow-up work.

At COLT 2006 Elad Hazan, Adam Kalai, Satyen Kale, and Amit Agarwal presented a modification which takes a Newton step guaranteeing O(log T) regret when the first and second derivatives are bounded. Then they applied these algorithms to portfolio management at ICML 2006 (with Robert Schapire) yielding some very fun graphs.