This hashing trick is implemented in Vowpal Wabbit as the core representation, as well as Torch and Stream, part of Elefant. It is particularly useful for online learning algorithms consuming large amounts of data.
Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, and SVN Vishwanathan, Hash Kernels for Structured Data, AISTAT 2009 and JMLR 2009.
This paper defines a 'hash kernel', shows that it can be handy for several problems, and makes an observation that hashing can work well asymptotically due to internal feature redundancy.
Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, Josh Attenberg, Feature Hashing for Large Scale Multitask Learning, ICML 2009 Theorem 3 in this paper was wrong as originally published, but fixed now.
Here, we provide a large scale application of the hashing trick to mass personalized spam filtering, define an unbiased hash, and prove that the unbiased hash is length preserving when the magnitudes of individual elements are not large. This implies that hashing can be thought of as a sparsity preserving projection.
See also Small Statistical Models by Random Feature Mixing by Kuzman Ganchev and Mark Dredze which shows this approach can work for NLP applications.And Extremely Fast Text Feature Extraction for Classification and Indexing by George Forman and Evan Kirshenbaum which shows that a very fast combined parser/hasher works well on a variety of problems.