Use of the Zero-Norm with Linear Models and Kernel Methods
Jason Weston, André Elisseeff, Bernhard Schölkopf, Mike Tipping;
We explore the use of the so-called zero-norm of the
parameters of linear models in learning. Minimization of such a
quantity has many uses in a machine learning context: for
variable or feature selection, minimizing training error and
ensuring sparsity in solutions. We derive a simple but practical
method for achieving these goals and discuss its relationship to
existing techniques of minimizing the zero-norm.
The method boils down to implementing
a simple modification of vanilla SVM, namely via
an iterative multiplicative rescaling of the training data.
Applications we investigate which aid our discussion include
variable and feature selection on biological microarray data,
and multicategory classification.