Breaking Abstractions – Machine Learning (Theory)

Sam Roweis‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break.

Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers.
The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit.
Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work.
The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fundamental randomization. From the Bayesian viewpoint, precisely specifying our lack of knowledge is extremely difficult and typically not done.

So, what should we do when we learn that our basic tools can break? The answer, of course is to keep using them until something better comes along. However, the uncomfortable knowledge our tools break is necessary in a few ways:

When considering a new abstraction, the existence of a break does not imply that it is a useless abstraction. (Just as the existence of the breaks above does not imply that they are useless abstractions.)
When using an abstraction in some new way, we must generally consider “is this a reasonable use?”
Some tools break more often than other tools. We should actively consider the “rate of breakage” when deciding amongst tools.

One Reply to “Breaking Abstractions”

John is absolutely right in what he means, but Iâ€™d divide the â€œwhat should we doâ€ section in to two parts â€“
1) how should we use existing abstractions in practice â€“ Johnâ€™s number (2).
and
2) What abstractions are interesting (some of the answers are Johnâ€™s (1) and (3))
The big O notation was made in order to make the complexity independent of specific mechanisms, and, possibly, to make keeping track of things on paper easy. The â€œpolynomially computableâ€ abstraction is even a stronger case â€“ for a (really) theoretical cryptographer or machine learner x^2 and x^2000 are very similar. The choice was made simply â€˜cause if p and q are polynomials, so are p+q, p*q and p(q). If weâ€™re going for applying theory, we have to know what the relation between the object in the theory and the world weâ€™re working in are. If weâ€™re doing theoretical work we may wish it to relate to the real world, or we may enjoy it for other reasons.
The optimal search algorithm made in the note isnâ€™t ridiculous, it just doesnâ€™t help answer questions like â€œwhat GA methodology is applicable for problem A?â€. It helps (or may help) answering questions like â€œAre quantum computers essentially different from regular ones?â€.

Comments are closed.