It’s conference season, and smell of budding papers is in the air.
IJCAI 2005, January 21
COLT 2005, February 2
KDD 2005, February 18
ICML 2005, March 8
UAI 2005, March 16
AAAI 2005, March 18
Machine learning and learning theory research
A loss function is some function which, for any example, takes a prediction and the correct prediction, and determines how much loss is incurred. (People sometimes attempt to optimize functions of more than one example such as “area under the ROC curve” or “harmonic mean of precision and recall”.) Typically we try to find predictors that minimize loss.
There seems to be a strong dichotomy between two views of what “loss” means in learning.
I don’t fully understand the second viewpoint. It seems (to some extent) like looking where the light is rather than where your keys fell on the ground. Many of these losses-of-convenience also seem to have behavior unlike real world problems. For example in this contest somebody would have been the winner except they happened to predict one example incorrectly with very low probability. Under log loss, their loss became very high. This does not seem to correspond to the intuitive notion of what the loss should be on the problem.
“Assumption” is another word to be careful with in machine learning because it is used in several ways.
One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if not assumption then not conclusion“. This is incorrect logic. For example, with variant (1), “the assumption of my prior is not met so the algorithm will not learn”. Or, with variant (3), “the data is not IID, so my learning algorithm designed for IID data will not work”. In each of these cases “will” must be replaced with “may” for correctness.
Let’s define a learning problem as making predictions given past data. There are several ways to attack the learning problem which seem to be equivalent to solving the learning problem.
Each of these viewpoints seems to be “right”, and each seems to have some utility in it’s context. It also seems important to realize that these are just different versions of the same problem. One consequence of this observation is that “wrapper methods” which try to automatically find a subset of features to feed into a learning algorithm in order to improve learning performance are simply trying to repair weaknesses in the learning algorithm.
Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used.
To avoid confusion, you should be careful to understand what other people mean for this word. It is helpful to always be explicit about which variables are randomized and which are constant whenever probability is used because Bayesian and Frequentist probabilities commonly switch this role.