Cover Tree code FAQ

The intent of this document is to cover several nonintuitive details associated with using the cover tree.

The point of the cover tree code is speed. Because the point is speed, the interface to the code differs in two ways from what might be expected.

1. Batch operations. The cover tree code does batch tree construction and batch query answering. The return format for a batch query is a vector with on entry per query where each entry is a vector with the first entry equal to the query and the other entries equal to the results.
2. Short circuit distance. The distance function has 3 arguments rather than 2. The third argument is an upper bound on the distance the cover tree cares about. Let distance(q,p,f) must satisfy the triangle inequality and symmetry over q and p. If the distance is d(q,p) > f, then distance(q,p,f) need only return some value greater than f. It must be strictly greater.

Points

Q: Why doesn't your definition of a point include a label?

A: Because it isn't needed. The definition of a point is a computer address, which is already a unique identifier.

Q: But what if I need to refer to the ith point?

A: Use pointer arithmetic.

Q: But what if I want to have my own labeling system?

A: You'll need to define your own notion of point, which is pretty easy. This includes: defining how you parse a point, defining how you compute the distance between points, and (possibly) defining a function for printing a point. Look at vector.cc and sparse_vector.cc for examples.

Cover Tree Calls

Q: Why does the cover tree return the same point twice?

A: You are probably using the same dataset for answers and queries. The first result in any return is the query point and other results are from the answers. If the same dataset is used for both queries and answers, a point will appear twice.

Q: What happens if I query for k > the number of answers points?

A: Don't do that. Currently, the result is invalid.

Q: Why do I get more than k result points back sometimes?

A: Because you had a tie for the distance to the kth position.

Q: How do I insert and remove single points?

A: That's going to be difficult. There isn't currently code to do this, although it clearly can be written.

Q: Does the cover tree code actually implement the cover tree datastructure as in the paper?

A: Not quite. You'll need to set base = 2.0 to get a valid tree. The code uses base=1.3 because it's works better in practice. This parameter controls a softening of the seperation invariant. All of the other optimizations (there are several) do not affect the correctness.

Q: Are there any known bugs?

A: There is one performance bug: if many points have distance zero, your performance may be quadratic. There are no known correctness bugs.

Q: Are you sure?

A: Triple check your code. Did you use 'abs' instead of 'fabs'? Did you get the semantics of 'upper_bound' right? Did you call the cover tree with k > the number of points? If that fails, email John Langford with the simplest example you can find exhibiting the bug.