The Journal of Machine Learning Gossip has some fine satire about learning research. In particular, the guides are amusing and remarkably true.
As in all things, it’s easy to criticize the way things are and harder to make them better.
Machine learning and learning theory research
The Journal of Machine Learning Gossip has some fine satire about learning research. In particular, the guides are amusing and remarkably true.
As in all things, it’s easy to criticize the way things are and harder to make them better.
One way to organize learning theory is by assumption (in the assumption = axiom sense), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases.
This doesn’t include all forms of learning theory, because I do not know them all. If there are other bits you know of, please comment.
A loss function is some function which, for any example, takes a prediction and the correct prediction, and determines how much loss is incurred. (People sometimes attempt to optimize functions of more than one example such as “area under the ROC curve” or “harmonic mean of precision and recall”.) Typically we try to find predictors that minimize loss.
There seems to be a strong dichotomy between two views of what “loss” means in learning.
I don’t fully understand the second viewpoint. It seems (to some extent) like looking where the light is rather than where your keys fell on the ground. Many of these losses-of-convenience also seem to have behavior unlike real world problems. For example in this contest somebody would have been the winner except they happened to predict one example incorrectly with very low probability. Under log loss, their loss became very high. This does not seem to correspond to the intuitive notion of what the loss should be on the problem.
“Assumption” is another word to be careful with in machine learning because it is used in several ways.
One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if not assumption then not conclusion“. This is incorrect logic. For example, with variant (1), “the assumption of my prior is not met so the algorithm will not learn”. Or, with variant (3), “the data is not IID, so my learning algorithm designed for IID data will not work”. In each of these cases “will” must be replaced with “may” for correctness.