Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections

Ping Li; Trevor J. Hastie; Kenneth W. Church

Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections

Ping Li, Trevor J. Hastie, Kenneth W. Church; 8(83):2497−2532, 2007.

Abstract

For dimension reduction in the l₁ norm, the method of Cauchy random projections multiplies the original data matrix A ∈ ℝ^n×D with a random matrix R ∈ ℝ^D×k (k≪D) whose entries are i.i.d. samples of the standard Cauchy C(0,1). Because of the impossibility result, one can not hope to recover the pairwise l₁ distances in A from B=A×R∈ ℝ^n×k, using linear estimators without incurring large errors. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining.

We study three types of nonlinear estimators: the sample median estimators, the geometric mean estimators, and the maximum likelihood estimators (MLE). We derive tail bounds for the geometric mean estimators and establish that k = O(log n / ε²) suffices with the constants explicitly given. Asymptotically (as k→∞), both the sample median and the geometric mean estimators are about 80% efficient compared to the MLE. We analyze the moments of the MLE and propose approximating its distribution of by an inverse Gaussian.