Machine Learning (Theory) – Page 99 – Machine learning and learning theory research

4/16/20054/16/2005

Which Assumptions are Reasonable?

One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful.

Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings. The meaning used here is “assumption = axiom” (i.e. something you can not verify).

Assumption	Reasonable?	Which analysis?	Example/notes
Independent and Identically Distributed Data	Sometimes	PAC,ERM,Prediction bounds,statistics	The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…)
Independently Distributed Data	More than IID, but still only sometimes	online->batch conversion	Losing “identical” can be helpful in situations where you have a cyclic process generating data.
Finite exchangeability (FEX)	Sometimes reasonable	as for IID	There are a good number of situations where there is a population we wish to classify, pay someone to classify a random subset, and then try to learn.
Input space uniform on a sphere	No.	PAC, active learning	I’ve never observed this in practice.
Functional form: “or” of variables, decision list, “and” of variables	Sometimes reasonable	PAC analysis	There are often at least OK functions of this form that make good predictions
No Noise	Rarely reasonable	PAC, ERM	Most learning problems appear to be of the form where the correct prediction given the inputs is fundamentally ambiguous.
Functional form: Monotonic on variables	Often	PAC-style	Many natural problems seem to have behavior monotonic in their input variables.
Functional form: xor	Occasionally	PAC	I was suprised to observe this.
Fast Mixing	Sometimes	RL	Interactive processes often fail to mix, ever, because entropy always increases.
Known optimal state distribution	Sometimes	RL	Sometimes humans know what is going on, and sometimes not.
Small approximation error everywhere	Rarely	RL	Approximate policy iteration is known for sometimes behaving oddly.

If anyone particularly agrees or disagrees with the reasonableness of these assumptions, I’m quite interested.

4/14/20054/14/2005

Families of Learning Theory Statements

The diagram above shows a very broad viewpoint of learning theory.

arrow	Typical statement	Examples
Past->Past	Some prediction algorithm A does almost as well as any of a set of algorithms.	Weighted Majority
Past->Future	Assuming independent samples, past performance predicts future performance.	PAC analysis, ERM analysis
Future->Future	Future prediction performance on subproblems implies future prediction performance using algorithm A.	ECOC, Probing

A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?

4/10/20054/10/2005

Is the Goal Understanding or Prediction?

Steve Smale and I have a debate about goals of learning theory.

Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1], you can state a theorem about the error rate dependent on the variance, E_(x,y)~D (y-E_y’~D|x[y’])².

I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm.

Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science”. Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc…

“Operative” mathematics helps you make predictions about the world. A strong example of operative mathematics is Newtonian mechanics in physics: it’s a great tool to help you predict what is going to happen in the world.

In addition to the “I want to do things” motivation for operative mathematics, I find it less arbitrary. In particular, two reasonable people can each be convinced they understand a topic in ways so different that they do not understand the viewpoint. If these understandings are operative, the rest of us on the sidelines can better appreciate which understanding is “best”.

4/8/20054/8/2005

Fast SVMs

There was a presentation at snowbird about parallelized support vector machines. In many cases, people parallelize by ignoring serial operations, but that is not what happened here—they parallelize with optimizations. Consequently, this seems to be the fastest SVM in existence.

There is a related paper here.

4/6/20054/7/2005

Structured Regret Minimization

Geoff Gordon made an interesting presentation at the snowbird learning workshop discussing the use of no-regret algorithms for the use of several robot-related learning problems. There seems to be a draft here. This seems interesting in two ways:

Drawback Removal One of the significant problems with these online algorithms is that they can’t cope with structure very easily. This drawback is addressed for certain structures.
Experiments One criticism of such algorithms is that they are too “worst case”. Several experiments suggest that protecting yourself against this worst case does not necessarily incur a great loss.