## Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

** Jie Chen, Haw-ren Fang, Yousef Saad**; 10(69):1989−2012, 2009.

### Abstract

Nearest neighbor graphs are widely used in data mining and machine
learning. A brute-force method to compute the exact *k*NN graph
takes Θ(*dn*^{2}) time for *n* data points in
the *d* dimensional
Euclidean space. We propose two divide and conquer methods for
computing an approximate *k*NN graph in Θ(*dn ^{t}*) time for high
dimensional data (large

*d*). The exponent

*t*∈ (1,2) is an increasing function of an internal parameter α which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained with small overlaps, that is, for small values of

*t*. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building

*k*NN graphs.

© JMLR 2009. (edit, beta) |