## Submatrix localization via message passing

*Bruce Hajek, Yihong Wu, Jiaming Xu*; 18(186):1−52, 2018.

### Abstract

The principal submatrix localization problem deals with
recovering a $K\times K$ principal submatrix of elevated mean
$\mu$ in a large $n\times n$ symmetric matrix subject to
additive standard Gaussian noise, or more generally, mean zero,
variance one, subgaussian noise. This problem serves as a
prototypical example for community detection, in which the
community corresponds to the support of the submatrix. The main
result of this paper is that in the regime $\Omega(\sqrt{n}) \leq
K \leq o(n)$, the support of the submatrix can be weakly
recovered (with $o(K)$ misclassification errors on average) by
an optimized message passing algorithm if $\lambda =
\mu^2K^2/n$, the signal-to-noise ratio, exceeds $1/e$. This
extends a result by Deshpande and Montanari previously obtained
for $K=\Theta(\sqrt{n})$ and $\mu=\Theta(1).$ In addition, the
algorithm can be combined with a voting procedure to achieve the
information-theoretic limit of exact recovery with sharp
constants for all $K \geq \frac{n}{\log n} (\frac{1}{8e} +
o(1))$. The total running time of the algorithm is $O(n^2\log
n)$. Another version of the submatrix localization problem,
known as noisy biclustering, aims to recover a $K_1\times K_2$
submatrix of elevated mean $\mu$ in a large $n_1\times n_2$
Gaussian matrix. The optimized message passing algorithm and its
analysis are adapted to the bicluster problem assuming
$\Omega(\sqrt{n_i}) \leq K_i \leq o(n_i)$ and $K_1\asymp K_2.$ A
sharp information-theoretic condition for the weak recovery of
both clusters is also identified.

[abs][pdf][bib]