One way to organize learning theory is by assumption (in the assumption = axiom sense), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases.
- No assumptions
- Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms.
- Universal Learning Using a “bias” of 2^{- description length of turing machine} in learning is equivalent to all other computable biases up to some constant.
- Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems.
- Independent and Identically Distributed (IID) Data
- Performance Prediction Based upon past performance, you can predict future performance.
- Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
- IID and partial constraints on the data source
- PAC Learning There exists fast algorithms for learning when all examples agree with some function in a function class (such as monomials, decision list, etc…)
- Weak Bayes The Bayes law learning algorithm will eventually reach the right solution as long as the right solution has a positive prior.
- Strong Constraints on the Data Source
- Bayes Learning When the data source is drawn from the prior, using Bayes law is optimal
This doesn’t include all forms of learning theory, because I do not know them all. If there are other bits you know of, please comment.