Prior, “Prior” and Bias – Machine Learning (Theory)

Many different ways of reasoning about learning exist, and many of these suggest that some method of saying “I prefer this predictor to that predictor” is useful and necessary. Examples include Bayesian reasoning, prediction bounds, and online learning. One difficulty which arises is that the manner and meaning of saying “I prefer this predictor to that predictor” differs.

Prior (Bayesian) A prior is a probability distribution over a set of distributions which expresses a belief in the probability that some distribution is the distribution generating the data.
“Prior” (Prediction bounds & online learning) The “prior” is a measure over a set of classifiers which expresses the degree to which you hope the classifier will predict well.
Bias (Regularization, Early termination of neural network training, etc…) The bias is some (often implicitly specified by an algorithm) way of preferring one predictor to another.

This only scratches the surface—there are yet more subtleties. For example the (as mentioned in meaning of probability) shifts from one viewpoint to another.

3 Replies to “Prior, “Prior” and Bias”

Prior (Bayesian) A prior is a probability distribution over a set of distributions which expresses a belief in the probability that some distribution is the distribution generating the data.

You’re killing me John. That’s certainly not the way a Bayesian would put it. A bayesian might say: I have a set of beliefs about a classification $y$ given inputs $x$ if I knew $h$. I also have some beliefs about which $h$’s a find most plausible before I saw any data.
There is no “generating data”– there is only the data you see and your (coherent) beliefs about it. I assume your last sentence was going there (or somewhere near there…), but I can’t quite parse it…. I think there is only a slight difference in reality between 1 and 2 and it’s all about what probability means, and what the metric of goodness is.

Are you dieing due to the terminology or the facts?

I believe the facts are correct, modulo issues of what we mean by “probability”.

We will agree the math is the same. But your “facts” are explicitly an interpretation of what’s going on and not the math. (So are mine, of course.) Saying it your way is convoluted and invokes non-Bayesian notions of probability.

Comments are closed.