Jacob Abernethy and I have found a computationally tractable method for computing an optimal (or near optimal depending on setting) master algorithm combining expert predictions addressing this open problem. A draft is here.
The effect of this improvement seems to be about a factor of 2 decrease in the regret (= error rate minus best possible error rate) for the low error rate situation. (At large error rates, there may be no significant difference.)
There are some unfinished details still to consider:
- When we remove all of the approximation slack from online learning, is the result a satisfying learning algorithm, in practice? I consider online learning is one of the more compelling methods of analyzing and deriving algorithms, but that expectation must be either met or not by this algorithm
- Some extra details: The algorithm is optimal given a small amount of side information (k in the draft). What is the best way to remove this side information? The removal is necessary for a practical algorithm. One mechanism may be the k->infinity limit.