New Algorithms for Efficient High-Dimensional Nonparametric Classification

Ting Liu; Andrew W. Moore; Alexander Gray

This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include data sets with up to 10⁶ dimensions and 10⁵ records, and demonstrate non-trivial speed-ups while giving exact answers.

New Algorithms for Efficient High-Dimensional Nonparametric Classification

Abstract