I don’t consider myself a “Bayesian”, but I do try hard to understand why Bayesian learning works. For the purposes of this post, Bayesian learning is a simple process of:
- Specify a prior over world models.
- Integrate using Bayes law with respect to all observed information to compute a posterior over world models.
- Predict according to the posterior.
Bayesian learning has many advantages over other learning programs:
- Interpolation Bayesian learning methods interpolate all the way to pure engineering. When faced with any learning problem, there is a choice of how much time and effort a human vs. a computer puts in. (For example, the mars rover pathfinding algorithms are almost entirely engineered.) When creating an engineered system, you build a model of the world and then find a good controller in that model. Bayesian methods interpolate to this extreme because the Bayesian prior can be a delta function on one model of the world. What this means is that a recipe of “think harder” (about specifying a prior over world models) and “compute harder” (to calculate a posterior) will eventually succeed. Many other machine learning approaches don’t have this guarantee.
- Language Bayesian and near-Bayesian methods have an associated language for specifying priors and posteriors. This is significantly helpful when working on the “think harder” part of a solution.
- Intuitions Bayesian learning involves specifying a prior and integration, two activities which seem to be universally useful. (see intuitions).
With all of these advantages, Bayesian learning is a strong program. However, there are also some very significant disadvantages.
- Information theoretically infeasible It turns out that specifying a prior is extremely difficult. Roughly speaking, we must specify a real number for every setting of the world model parameters. Many people well-versed in Bayesian learning don’t notice this difficulty for two reasons:
- They know languages allowing more compact specification of priors. Acquiring this knowledge takes some signficant effort.
- They lie. They don’t specify their actual prior, but rather one which is convenient. (This shouldn’t be taken too badly, because it often works.)
- Computationally infeasible Let’s suppose I could accurately specify a prior over every air molecule in a room. Even then, computing a posterior may be extremely difficult. This difficulty implies that computational approximation is required.
- Unautomatic The “think harder” part of the Bayesian research program is (in some sense) a “Bayesian employment” act. It guarantees that as long as new learning problems exist, there will be a need for Bayesian engineers to solve them. (Zoubin likes to counter that a superprior over all priors can be employed for automation, but this seems to add to the other disadvantages.)
Overall, if a learning problem must be solved a Bayesian should probably be working on it and has a good chance of solving it.
I wish I knew whether or not the drawbacks can be convincingly addressed. My impression so far is “not always”.