One of the central concerns of learning is to understand and to

prevent overfitting. Various notion of “function complexity” often

arise: VC dimension, Rademacher complexity, comparison classes of

experts, and program length are just a few.

The term “complexity” to me seems somehow misleading; the terms never

capture something that meets my intuitive notion of complexity. The

Bayesian notion clearly captures what’s going on. Functions aren’t

“complex”– they’re just “surprising”: we assign to them low

probability. Most (all?) complexity notions I know boil down

to some (generally loose) bound on the prior probability of the function.

In a sense, “complexity” fundementally arises because probability

distributions must sum to one. You can’t believe in all possibilities

at the same time, or at least not equally. Rather you have to

carefully spread the probability mass over the options you’d like to

consider. Large complexity classes means that beliefs are spread

thinly. In it’s simplest form, this phenomenom give the log (1\n) for

n hypotheses in classic PAC bounds.

In fact, one way to think about good learning algorithms is that they

are those which take full advantage of their probability mass.

In the language of Minimum Description Length, they correspond to

“non-defective distributions”.

So this raises a question: are there notions of complexity (preferably finite,

computable ones) that differ fundementally from the notions of “prior”

or “surprisingness”? Game-theoretic setups would seem to be promising,

although much of the work I’m familiar with ties it closely to the notion

of prior as well.