A General Framework for Fast Stagewise Algorithms
Ryan J. Tibshirani; 16(Dec):2543−2588, 2015.
Forward stagewise regression follows a very simple strategy for constructing a sequence of sparse regression estimates: it starts with all coefficients equal to zero, and iteratively updates the coefficient (by a small amount $\epsilon$) of the variable that achieves the maximal absolute inner product with the current residual. This procedure has an interesting connection to the lasso: under some conditions, it is known that the sequence of forward stagewise estimates exactly coincides with the lasso path, as the step size $\epsilon$ goes to zero. Furthermore, essentially the same equivalence holds outside of least squares regression, with the minimization of a differentiable convex loss function subject to an $\ell_1$ norm constraint (the stagewise algorithm now updates the coefficient corresponding to the maximal absolute component of the gradient).
Even when they do not match their $\ell_1$-constrained analogues, stagewise estimates provide a useful approximation, and are computationally appealing. Their success in sparse modeling motivates the question: can a simple, effective strategy like forward stagewise be applied more broadly in other regularization settings, beyond the $\ell_1$ norm and sparsity? The current paper is an attempt to do just this. We present a general framework for stagewise estimation, which yields fast algorithms for problems such as group- structured learning, matrix completion, image denoising, and more.