next up previous
Next: 2. The Linear Support Up: Lagrangian Support Vector Machines Previous: Lagrangian Support Vector Machines


1. Introduction

Support vector machines (SVMs) [28,4,3,16,14] are powerful tools for data classification. Classification is achieved by a linear or nonlinear separating surface in the input space of the dataset. In this work we propose a very fast simple algorithm, based on an an implicit Lagrangian formulation [21] of the dual of a simple reformulation of the standard quadratic program of a linear support vector machine. This leads to the minimization of an unconstrained differentiable convex function in an $m$-dimensional space where $m$ is the number of points to be classified in a given $n$ dimensional input space. The necessary optimality condition for this unconstrained minimization problem can be transformed into a very simple symmetric positive definite complementarity problem (12). A linearly convergent iterative Lagrangian support vector machine (LSVM) Algorithm 1 is given for solving (12). LSVM requires the inversion at the outset of a single matrix of the order of the dimensionality of the original input space plus one: $(n+1)$. The algorithm can accurately solve problems with millions of points and requires only standard native MATLAB commands without any optimization tools such as linear or quadratic programming solvers.

As was the case in our recent active set support vector machine (ASVM) approach [18], the following two simple changes were made to the standard linear SVM: (i) The margin (distance) between the parallel bounding planes was maximized with respect to both orientation ($w$) as well as location relative to the origin ($\gamma$). Such a change was also carried out in our successive overrelaxation (SOR) approach [17] as well as in the smooth support vector machine (SSVM) approach of [14]. See equation (7) and Figure 1. (ii) The error in the soft margin ($y$) was minimized using the 2-norm squared instead of the conventional 1-norm. See equation (7). These straightforward, but important changes, lead to a considerably simpler positive definite dual problem with nonnegativity constraints only. See equation (8). Although our use of a quadratic penalty for the soft margin as indicated in (ii) above is nonstandard, it has been used effectively in previous work [14,13,18]. In theory, this change could lead to a less robust classifier in the presence of outliers. However, this modification to the SVM does not seem to have a detrimental effect, as can be seen in the experimental results of Section 5.

In Section 2 of the paper we begin with the standard SVM formulation and its dual and then give our formulation and its dual. We note that computational evidence given in our previous work [18] shows that this alternative formulation does not compromise on generalization ability. Section 3 gives our simple iterative Lagrangian support vector machine (LSVM) Algorithm 1 and establishes its global linear convergence. LSVM, stated in 11 lines of MATLAB Code 2 below, solves once at the outset a single system of $n+1$ equations in $n+1$ variables given by a symmetric positive definite matrix. It then uses a linearly convergent iterative method to solve the problem. In Section 4 LSVM is extended to positive semidefinite nonlinear kernels. Section 5 describes our numerical results which show that LSVM gives comparable or better testing results than those of other SVM algorithms, and in some cases is dramatically faster than a standard quadratic programming SVM solver.

We now describe our notation and give some background material. All vectors will be column vectors unless transposed to a row vector by a prime $'$. For a vector $x$ in the $n$-dimensional real space $R^n$, $x_+$ denotes the vector in $R^n$ with all of its negative components set to zero. This corresponds to projecting $x$ onto the nonnegative orthant. The base of the natural logarithms will be denoted by $\varepsilon$ , and for a vector $y \in R^m,\; \varepsilon^{-y}$ will denote a vector in $R^m$ with components $\varepsilon^{-y_i},\;i=1,\ldots,m$. The notation $A \in R^{m \times n}$ will signify a real $m \times n$ matrix. For such a matrix $A'$ will denote the transpose of $A$, $A_i$ will denote the $i$-th row of $A$ and $A_{\cdot j}$ will denote the $j$-th column of $A$. A vector of ones or zeroes in a real space of arbitrary dimension will be denoted by $e$ or $0$, respectively. The identity matrix of arbitrary dimension will be denoted by $I$. For two vectors $x$ and $y$ in $R^n$, $x\perp y$ denotes orthogonality, that is $x'y=0$. We use $:=$ to denote definition. The 2-norm of a vector $x$ and a matrix $Q$ will be denoted by $\Vert x\Vert$ and $\Vert Q\Vert$ respectively. A separating plane, with respect to two given point sets ${\mathcal A}$ and ${\mathcal B}$ in $R^n$, is a plane that attempts to separate $R^n$ into two halfspaces such that each open halfspace contains points mostly of ${\mathcal A}$ or ${\mathcal B}$. A special case of the Sherman-Morrison-Woodbury (SMW) identity [7] will be utilized:

\begin{displaymath}
(\frac{I}{\nu}+HH')^{-1}=\nu(I-H(\frac{I}{\nu}+H'H)^{-1}H'),
\end{displaymath} (1)

where $\nu$ is a positive number and $H$ is an arbitrary $m\times k$ matrix. This identity, easily verifiable by premultiplying both sides by $(\frac{I}{\nu}+HH')$, enables us to invert a large $m\times m$ matrix of the form in (1) by merely inverting a smaller $k\times k$ matrix. It was also used recently in our ASVM paper [18] as well as by [6].


next up previous
Next: 2. The Linear Support Up: Lagrangian Support Vector Machines Previous: Lagrangian Support Vector Machines
Journal of Machine Learning Research