Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection

Abhimanyu Das; David Kempe

We introduce the submodularity ratio as a measure of how “close” to submodular a set function $f$ is. We show that when $f$ has submodularity ratio $\gamma$, the greedy algorithm for maximizing $f$ provides a $(1-e^{-\gamma})$-approximation. Furthermore, when $\gamma$ is bounded away from 0, the greedy algorithm for minimum submodular cover also provides essentially an $O(\log n)$ approximation for a universe of $n$ elements. As a main application of this framework, we study the problem of selecting a subset of $k$ random variables from a large set, in order to obtain the best linear prediction of another variable of interest. We analyze the performance of widely used greedy heuristics; in particular, by showing that the submodularity ratio is lower-bounded by the smallest $2k$-sparse eigenvalue of the covariance matrix, we obtain the strongest known approximation guarantees for the Forward Regression and Orthogonal Matching Pursuit algorithms. As a second application, we analyze greedy algorithms for the dictionary selection problem, and significantly improve the previously known guarantees. Our theoretical analysis is complemented by experiments on real-world and synthetic data sets; in particular, we focus on an analysis of how tight various spectral parameters and the submodularity ratio are in terms of predicting the performance of the greedy algorithms.

Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection

Abstract