Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used.
- Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y. The probability is then Y/(Y+1). Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively.
- Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in n experiments then it has probability about k/n. Frequentist probabilities can be used to measure how sure you are about something. They may be appropriate in a learning context for measuring confidence in various predictors. The frequentist notion of probability is common in physics, other sciences, and computer science theory.
- Estimated The estimated notion of probability is measured by running some learning algorithm which predicts the probability of events rather than events. I tend to dislike this use of the word because it confuses the world with the model of the world.
To avoid confusion, you should be careful to understand what other people mean for this word. It is helpful to always be explicit about which variables are randomized and which are constant whenever probability is used because Bayesian and Frequentist probabilities commonly switch this role.