Cover Tree for Nearest Neighbor calculations
Alina Beygelzimer, Sham Kakade, and John Langford, Cover Trees for Nearest Neighbor, ICML 2006. Video
A longer version and experimental results
Thomas Kollar found a small
bug in the insert algorithm description. This doesn't appear in
the code because the code uses a batch insert implementation.
A Cover Tree is a datastructure helpful in calculating the nearest
neighbor of points given only a metric. A cover tree is particularly
motivating for a confluence of reasons:
code (v4) (Under LGPL/GPL license), templated code (v2), datasets,
and sparse datasets (This is version 2, the templated version with both vector points and sparse vector points defined.) A cover tree code faq. v4 fixes a subtle bug that Ryan Curtin found and has a more robust zero-distance check.
- The running time of a nearest neighbor query is only O(log(n)) given a fixed intrinsic dimensionality. (like KR2002 and KL04)
- The space usage and query time are O(n) under no assumptions. (like the naive approach, sb(s), and ball trees)
- It's remarkably fast in practice.
Gordon Rios notes a few details on porting to a Mac.
ApplicationDinoj Surendran has
added the cover tree to isomap yielding speedups, as
expected. The code.
|Speedup of a cover tree over an optimized brute force approach for querying 1,2,3,5, and 10 (euclidean) nearest neighbors on a collection of datasets. |
|Speedup of a cover tree over the sb(S) datastructure for querying 1,2 (euclidean) nearest neighbors|
|Speedup of a cover tree over the sb(S) datastructure for querying 1,2 (string) nearest neighbors