## Stagewise Lasso

** Peng Zhao, Bin Yu**; 8(Dec):2701--2726, 2007.

### Abstract

Many statistical machine learning algorithms minimize either an empirical loss function as in AdaBoost, or a penalized empirical loss as in Lasso or SVM. A single regularization tuning parameter controls the trade-off between fidelity to the data and generalizability, or equivalently between bias and variance. When this tuning parameter changes, a regularization "path" of solutions to the minimization problem is generated, and the whole path is needed to select a tuning parameter to optimize the prediction or interpretation performance. Algorithms such as homotopy-Lasso or LARS-Lasso and Forward Stagewise Fitting (FSF) (aka e-Boosting) are of great interest because of their resulted sparse models for interpretation in addition to prediction.

In this paper, we propose the BLasso algorithm that
ties the FSF (e-Boosting) algorithm with the Lasso method
that minimizes the *L*_{1} penalized *L*_{2} loss.
BLasso is derived
as a coordinate descent method with a fixed stepsize applied to the
general Lasso loss function (*L*_{1} penalized convex loss). It
consists of both a forward step and a backward step. The forward
step is similar to e-Boosting or FSF, but the
backward step is new and revises the FSF (or e-Boosting) path to approximate the
Lasso path. In the cases of a finite number of base learners and a
bounded Hessian of the loss function,
the BLasso path is shown to converge to the Lasso path when
the stepsize goes to zero. For
cases with a larger number of base learners than the sample size
and when the true model is sparse, our simulations indicate
that the BLasso model estimates
are sparser than those from FSF with comparable
or slightly better prediction performance, and that
the the discrete stepsize of BLasso and FSF has
an additional regularization effect in terms
of prediction and sparsity.
Moreover, we introduce the Generalized BLasso
algorithm
to minimize a general convex loss penalized by a general convex
function. Since the (Generalized) BLasso relies only on differences not derivatives,
we conclude that it provides a class of
simple and easy-to-implement algorithms
for tracing the regularization or solution paths of penalized minimization problems.

[abs][pdf]