## Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections

** Ping Li, Trevor J. Hastie, Kenneth W. Church**; 8(83):2497−2532, 2007.

### Abstract

For dimension reduction in the *l*_{1} norm, the method of
*Cauchy random projections* multiplies
the original data matrix **A** ∈ ℝ^{n×D}
with a random matrix **R** ∈ ℝ^{D×k}
(*k*≪*D*) whose entries are i.i.d. samples
of the standard Cauchy *C*(0,1). Because of the impossibility result, one can not
hope to recover the pairwise *l*_{1} distances in **A**
from **B**=**A**×**R**∈ ℝ^{n×k},
using linear estimators without incurring large
errors. However, nonlinear estimators are still useful for certain
applications in data stream computations, information
retrieval, learning, and data mining.

We study three types of nonlinear estimators: the
*sample median* estimators, the *geometric mean*
estimators, and the *maximum likelihood* estimators (MLE). We derive tail bounds
for the *geometric mean* estimators and establish that
*k* = *O*(log *n* / ε^{2}) suffices with the constants
explicitly given. Asymptotically (as *k*→∞), both the
*sample median* and the *geometric mean* estimators are about
80% efficient compared to the MLE. We analyze the moments of the MLE
and propose approximating its distribution of by an
inverse Gaussian.

© JMLR 2007. (edit, beta) |