Cover Tree for Nearest Neighbor calculations

Alina Beygelzimer, Sham Kakade, and John Langford, Cover Trees for Nearest Neighbor, ICML 2006. Video
A longer version and experimental results addendum
Thomas Kollar found a small bug in the insert algorithm description. This doesn't appear in the code because the code uses a batch insert implementation.

A Cover Tree is a datastructure helpful in calculating the nearest neighbor of points given only a metric. A cover tree is particularly motivating for a confluence of reasons:

  1. The running time of a nearest neighbor query is only O(log(n)) given a fixed intrinsic dimensionality. (like KR2002 and KL04)
  2. The space usage and query time are O(n) under no assumptions. (like the naive approach, sb(s), and ball trees)
  3. It's remarkably fast in practice.
code (v4) (Under LGPL/GPL license), templated code (v2), datasets, and sparse datasets (This is version 2, the templated version with both vector points and sparse vector points defined.) A cover tree code faq. v4 fixes a subtle bug that Ryan Curtin found and has a more robust zero-distance check.
Gordon Rios notes a few details on porting to a Mac.

Application

Dinoj Surendran has added the cover tree to isomap yielding speedups, as expected. The code.

Performance Tests

Speedup of a cover tree over an optimized brute force approach for querying 1,2,3,5, and 10 (euclidean) nearest neighbors on a collection of datasets.
Speedup of a cover tree over the sb(S) datastructure for querying 1,2 (euclidean) nearest neighbors
Speedup of a cover tree over the sb(S) datastructure for querying 1,2 (string) nearest neighbors