In my experience, there are two different groups of people who believe the same thing: the mathematics encountered in typical machine learning conference papers is often of questionable value.
The two groups who agree on this are applied machine learning people who have given up on math, and mature theoreticians who understand the limits of theory.
Partly, this is just a statement about where we are with respect to machine learning. In particular, we have no mechanism capable of generating a prescription for how to solve all learning problems. In the absence of such certainty, people try to come up with formalisms that partially describe and motivate how and why they do things. This is natural and healthy—we might hope that it will eventually lead to just such a mechanism.
But, part of this is simply an emphasis on complexity over clarity. A very natural and simple theoretical statement is often obscured by complexifications. Common sources of complexification include:
- Generalization By trying to make a statement that applies in the most general possible setting, your theorem becomes excessively hard to read.
- Specialization Your theorem relies upon so many assumptions that it is hard for a simple reader to hold them all in their head.
- Obscuration Your theorem relies upon cumbersome notation full of subsubsuperscripts, badly named variables, etc…
There are several reasons why complexification occurs.
- Excessive generalization often happens when authors have an idea and want to completely exploit it. So, various bells and whistles are added until the core idea is obscured.
- Excessive specialization often happens when authors have some algorithm they really want to prove works. So, they start making one assumption after another until the proof goes through.
- Obscuration is far more subtle than it sounds. Some of the worst obscurations come from using an old standard notation which has simply been pushed to far.
After doing research for awhile, you realize that these complexifications are counterproductive. Type (1) complexifications make it double hard for others to do follow-on work: your paper is hard to read and you have eliminated the possibility. Type (2) complexifications look like “the tail wags the dog”—the math isn’t really working until it guides the algorithm design. Figuring out how to remove the assumptions often results in the better algoritihm. Type (3) complexifications are an error. Fooling around to find a better notation is one of the ways that we sometimes make new discoveries.
The worst reason, I’ve saved for last: it’s that the reviewing process emphasizes precision over accuracy. Imagine shooting a math gun at a machine learning target. A high precision math gun will very carefully guide the bullets to strike a fixed location—even though the location may have little to do with the target. An accurate math gun will point at the correct target. A precision/accuracy tradeoff is often encountered: we don’t know how to think about the actual machine learning problem, so instead we very precisely think about another not-quite-right problem. A reviewer almost invariably prefers the more precise (but less accurate) paper because precision is the easy thing to check and think about.
There seems to be no easy fix for this—people naturally prefer to value the things they can measure. The hard fix for this is more time spent by everyone thinking about what the real machine learning problems are.
The solution for the precision-accuracy problem may be to force the authors to instantiate their theorem in the most reasonable setting that obeys all their assumptions. This will allow the reader to check if the theorem is interesting both in terms of where it might apply and what it implies.
I think you miss an important reason for making math complex: If the proofs are too easy then the reviewers think that the results cannot be very significant.
I found this earlier this year with my rejected COLT paper. My reviews all said something like “Beautiful paper, very well written, interesting results about an important problem but…” and then went on to say that it might have already been done (but they didn’t know where or by whom), and even if it hadn’t already been done then the results just didn’t seem difficult enough to prove. I think that if I’d proven the same results in a much more difficult way then the paper would have been accepted!
Anyway, I got the paper into ALT, but only just. The feed back from the ALT reviewers was pretty much the same as the COLT reviews: insufficently difficult proofs to arrive at some pretty interesting results.
This is what I meant by the second-to-last paragraph. The reviewers often expect and want to see a demonstration of precision.
Instantiation is often handy, but I do not think it is the cureall. The problem is that instantiation invites over-specialization in a theorem statement since you only need to demonstrate one setting in which the theorem applies.
I couldn’t agree more. We should focus more on the value of the idea rather than the obviousness of the idea.
Complicated theory? Please vote for complicated formulas (maybe from functional analysis) that raised big commercial interest in the past. And: in turn, please mention a big commercial success that is based on “simple” machine learning, maybe based on the identification of some more or less obvious features… Put another way: Is commercial success an indicator for the quality of theory?