How is Compressed Sensing going to change Machine Learning ?

Compressed Sensing (CS) is a new framework developed by Emmanuel Candes, Terry Tao and David Donoho. To summarize, if you acquire a signal in some basis that is incoherent with the basis in which you know the signal to be sparse in, it is very likely you will be able to reconstruct the signal from these incoherent projections.

Terry Tao, the recent Fields medalist, does a very nice job at explaining the framework here. He goes further in the theory description in this post where he mentions the central issue of the Uniform Uncertainty Principle. It so happens that random projections are on average incoherent, within the UUP meaning, with most known basis (sines, polynomials, splines, wavelets, curvelets …) and are therefore an ideal basis for Compressed Sensing. [ For more in-depth information on the subject, the Rice group has done a very good job at providing a central library of papers relevant to the growing subject: http://www.dsp.ece.rice.edu/cs/ ]

The Machine Learning community has looked at Random Projections before, for instance:

  • Experiments with Random Projections for Machine Learning by Fradkin and Madigan (KDD-03.)
  • Face Recognition Experiments with Random Projection by Goel, Bebis and Nefian
  • Dimensionality reduction by random mapping: Fast similarity computation for clustering by S. Kaski (Proceedings of IEEE International Joint Conference on Neural Networks, 1998.)
  • but while they seem to give somewhat comparable results with regards to PCA, the number of contributions on the subject does not seem overwhelming. Maybe one of the reason is that in most papers cited above, the main theoretical reason for using Random projections lies with the Johnson-Lindenstrauss (JL) lemma. As a consequence, most random matrices used in these publications come from the Database world and not from the newer framework of Compressed Sensing (a list of these matrices and their properties can be found in the middle of this page). The uncanny reliance on Random projections within the JL lemma and in the Compressed Sensing setting was explained by Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin in this paper entitled: A simple proof of the restricted isometry property for random matrices. However, the most interesting fallout from this comparison between JL and CS comes in the form of the contribution by Richard Baraniuk and Michael Wakin in Random projections of smooth manifolds. I’ll let the abstract speak for itself:

    We propose a new approach for nonadaptive dimensionality reduction of manifold-modeled data, demonstrating that a small number of random linear projections can preserve key information about a manifold-modeled signal……As our main theoretical contribution, we establish a sufficient number M of random projections to guarantee that, with high probability, all pairwise Euclidean and geodesic distances between points on M are well-preserved under the mapping \phi. Our results bear strong resemblance to the emerging theory of Compressed Sensing (CS), in which sparse signals can be recovered from small numbers of random linear measurements. As in CS, the random measurements we propose can be used to recover the original data in RN. Moreover, like the fundamental bound in CS, our requisite M is linear in the “information level” K and logarithmic in the ambient dimension N; we also identify a logarithmic dependence on the volume and curvature of the manifold. In addition to recovering faithful approximations to manifold-modeled signals, however, the random projections we propose can also be used to discern key properties about the manifold. We discuss connections and contrasts with existing techniques in manifold learning, a setting where dimensionality reducing mappings are typically nonlinear and constructed adaptively from a set of sampled training data.

    It looks as though, as a result, Universal Dimensionality Reduction is achieved by some properly chosen random projections. In the case of data living in a low dimensional manifold, the JL lemma states that the number of random projections is proportional to the number of points or samples from that manifold, on the other hand, CS seems to show that the number of random projections is proportional to the characteristic of the manifold only.

    The results highlighted by Wakin and Baraniuk are very compelling but there is another appealing reason to Random Projections: Robustness. While trying to mimick nature in the learning process, one cannot but be amazed at the reliability of the biological system. On the other hand, even the researchers that model these processes do not realize or point out that this robustness is in part due to random projections. Case in point, the excellent work of Thomas Serre, Aude Oliva and Tomaso Poggio culminating in a paper describing a biology inspired model of brain that shows its ability to process information in a feedforward fashion. The modeling is new in that, in this area of science, there is a central issue as to whether the brain works in a one-way process or with many loops. In the paper, the feature dimension reduction model (which is what this process is) uses random projections as I pointed out recently.

    Because of the intrinsic dimension reduction capability, Mike Wakin has also shown efficient nearest neighbor searches using few random projections (see figure 3 of this paper). I could go on but the point is that since CS is a revolution in the world of signal/feature acquisition and processing (see the analog-to-information A2I site ) one cannot but wonder aloud how this will affect Machine Learning in general.

    Mad (Neuro)science

    One of the questions facing machine learning as a field is “Can we produce a generalized learning system that can solve a wide array of standard learning problems?” The answer is trivial: “yes, just have children”.

    Of course, that wasn’t really the question. The refined question is “Are there simple-to-implement generalized learning systems that can solve a wide array of standard learning problems?” The answer to this is less clear. The ability of animals (and people ) to learn might be due to megabytes encoded in the DNA. If this algorithmic complexity is necessary to solve machine learning, the field faces a daunting task in replicating it on a computer.

    This observation suggests a possibility: if you can show that few bits of DNA are needed for learning in animals, then this provides evidence that machine learning (as a field) has a hope of big success with relatively little effort.

    It is well known that specific portions of the brain have specific functionality across individuals. There are two ways this observation can be explained:

    1. Maybe the specific functionality areas are encoded in the DNA.
    2. Maybe the specific functionality areas arise from the learning process of the brain. This is the answer that machine learning would like to hear because it agrees with the hypothesis that a simple general learning system exists.

    It’s important to realize that these choices actually specify a spectrum rather than a dichotomy. There are surely some problem-specific learning hacks in the brain and there is surely some generalized learning ability. The question is: to what degree is learning encoded by genetic heritage vs personal experience?

    It is anecdotally well known that people (especially children) can recover from fairly severe brain damage, but of course we would prefer to avoid anecdotal evidence.

    There are also neuroscience experiments addressing this question. This paper by Jitendra Sharma, Alessandra Angelucci, and Mriganka Sur provides some evidence. In a nutshell, they rewire the optic nerve of ferrets into the auditory region of the brain. They observe that structures similar to the visual specific region of the brain arise in the auditory region after rewiring (although the new regions may be less capable).

    There are doubtless many other experiments addressing this question, but my knowledge of Neuroscience is lacking. (Thanks to Maneesh for pointing this one out.)