# Machine Learning (Theory)

## 9/7/2006

### Objective and subjective interpretations of probability

Tags: Bayesian,Machine Learning Sanjoy@ 11:16 pm

An amusing tidbit (reproduced without permission) from Herman Chernoff’s delightful monograph, “Sequential analysis and optimal design”:

The use of randomization raises a philosophical question which is articulated by the following probably apocryphal anecdote.

The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven, and the last two into the cool oven. The statistician, horrified, explained how he should randomize to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two pieces into the hot oven, the next two into the medium oven, and the last two into the cool oven. “Obviously, we can’t do that,” said the metallurgist. “On the contrary, you have to do that,” said the statistician.

What are arguments for and against this design? In a “larger” design or sample, the effect of a reasonable randomization scheme could be such that this obvious difficulty would almost certainly not happen. Assuming that the original strength of the bar and the heat treatment did not “interact” in a complicated nonlinear way, the randomization would virtually cancel out any effect due to a strength gradient or other erratic phenomena, and computing estimates as though these did not exist would lead to no real error. In this small problem, the effect may not be cancelled out, but the statistician still has a right to close his eyes to the design actually selected if he is satisfied with “playing fair”. That is, if he instructs an agent to select the design and he analyzes the results, assuming there are no gradients, his conclusions will be unbiased in the sense that a tendency to overestimate is balanced on the average by a tendency to underestimate the desired quantities. However, this tendency may be substantial as measured by the variability of the estimates which will be affected by substantial gradients. On the other hand, following the natural inclination to reject an obviously unsatisfactory design resulting from randomization puts the statistician in the position of not “playing fair”. What is worse for an objective statistician, he has no way of evaluating in advance how good his procedure is if he can change the rules in the middle of the experiment.

The Bayesian statistician, who uses subjective probability and must consider all information, is unsatisfied to simply play fair. When randomization leads to the original unsatisfactory design, he is aware of this information and unwilling to accept the design. In general, the religious Bayesian states that no good and only harm can come from randomized experiments. In principle, he is opposed even to random sampling in opinion polling. However, this principle puts him in untenable computational positions, and a pragmatic Bayesian will often ignore what seems useless design information if there are no obvious quirks in a randomly selected sample.