next up previous
Next: 3. LSVM (Lagrangian Support Up: Lagrangian Support Vector Machines Previous: 1. Introduction


2. The Linear Support Vector Machine

We consider the problem of classifying $m$ points in the $n$-dimensional real space $R^n$, represented by the $m \times n$ matrix $A$, according to membership of each point $A_i$ in the class $\cal A+$ or $\cal A-$ as specified by a given $m\times m$ diagonal matrix $D$ with plus ones or minus ones along its diagonal. For this problem the standard support vector machine with a linear kernel [28,4] is given by the following quadratic program with parameter $\nu>0$:
\begin{displaymath}
\displaystyle{\min_{(w,\gamma,y) \in R^{n+1+m}}}\nu e'y+\frac{1}{2}w'w\;\;\mbox{s.t.}\; D(Aw-e\gamma)+y\ge e,\;\;y \ge 0.
\end{displaymath} (2)

Here $w$ is the normal to the bounding planes:
\begin{displaymath}
x'w = \gamma \pm 1
\end{displaymath} (3)

and $\gamma$ determines their location relative to the origin. See Figure 1. The plane $x'w=\gamma+1$ bounds the class $\cal A+$ points, possibly with some error, and the plane $x'w=\gamma-1$ bounds the class $\cal A-$ points, also possibly with some error. The linear separating surface is the plane:
\begin{displaymath}
x'w=\gamma,
\end{displaymath} (4)

midway between the bounding planes (3). The quadratic term in (2) is twice the reciprocal of the square of the 2-norm distance $2/\Vert w\Vert$ between the two bounding planes of (3) (see Figure 1). This term enforces maximization of this distance, which is often called the ``margin''. If the classes are linearly inseparable, as depicted in Figure 1, then the two planes bound the two classes with a ``soft margin''. That is, they bound each set approximately with some error determined by the nonnegative error variable $y$:
\begin{displaymath}
\begin{array}{rcccl}
A_iw&+&y_i &\ge &\gamma+1,\;\;\mbox{for...
...iw&-&y_i &\le &\gamma-1,\;\;\mbox{for}\; D_{ii}=-1.
\end{array}\end{displaymath} (5)

Traditionally the constant $\nu$ in (2) multiplies the the 1-norm of the error variable $y$ and acts as a weighting factor. A nonzero $y$ results in an approximate separation as depicted in Figure 1.

Figure 1: The bounding planes of a linear SVM with a soft margin (i.e. with some errors), and the separating plane approximately separating $\cal A+$ from $\cal A-$.
\begin{psfrags}
\psfrag{d1}{\Large\OliveGreen{$w$}}
\psfrag{d2}{\Large\OliveGr...
...'w=\gamma $}}
\includegraphics [width=2.2in,angle=0]{margin2c.eps} \end{psfrags}

The dual to the standard quadratic linear SVM (2) [15,25,16,5] is the following:
\begin{displaymath}
\displaystyle{\min_{u \in R^m}}\frac{1}{2}u'DAA'Du-e'u\;\;\mbox{s.t.}
\;e'Du=0,\;\;0\le u \le \nu e.
\end{displaymath} (6)

The variables $(w,\gamma)$ of the primal problem which determine the separating surface (4) can be obtained from the solution of the dual problem above [17, Eqns. 5 and 7]. We note immediately that the matrix $DAA'D$ appearing in the dual objective function (6) is not positive definite in general because typically $m>>n$. Also, there is an equality constraint present, in addition to bound constraints, which for large problems necessitates special computational procedures such as SMO [24] or SVM$^{light}$ [11]. Furthermore, a one-dimensional optimization problem [17] must be solved in order to determine the locator $\gamma$ of the separating surface (4). In order to overcome all these difficulties as well as that of dealing with the necessity of having to essentially invert a very large matrix of the order of $m\times m$, we propose the following simple but critical modifications to the standard SVM formulation (2). We change the 1-norm of $y$ to a 2-norm squared which makes the constraint $y\ge
0$ redundant. We also append the term $\gamma^2$ to $w'w$. This in effect maximizes the margin between the parallel separating planes (3) by optimizing with respect to both $w$ and $\gamma$ [17], that is with respect to both orientation and location of the planes, rather that just with respect to $w$ which merely determines the orientation of the plane. This leads to the following reformulation of the SVM:
\begin{displaymath}
\displaystyle{\min_{(w,\gamma,y) \in R^{n+1+m}}} \nu\frac{y'...
...ac{1}{2}( w'w+\gamma^2)\;\;\mbox{s.t.}\;D(Aw-e\gamma)+y \ge e.
\end{displaymath} (7)

The dual of this problem is [15]:
\begin{displaymath}
\displaystyle{\min_{0\le u\in R^m}}\frac{1}{2}u'(\frac{I}{\nu}+D(AA'+ee')D)u-e'u.
\end{displaymath} (8)

The variables $(w,\gamma)$ of the primal problem which determine the separating surface (4) are recovered directly from the solution of the dual (8) above by the relations:
\begin{displaymath}
w=A'Du,\;\; y=\frac{u}{\nu},\;\;\gamma=-e'Du.
\end{displaymath} (9)

We immediately note that the matrix appearing in the dual objective function is positive definite and that there is no equality constraint and no upper bound on the dual variable $u$. The only constraint present is a nonnegativity one. These facts lead us to our simple iterative Lagrangian SVM Algorithm which requires the inversion of a positive definite $(n+1)\times (n+1)$ matrix, at the beginning of the algorithm followed by a straightforward linearly convergent iterative scheme that requires no optimization package.
next up previous
Next: 3. LSVM (Lagrangian Support Up: Lagrangian Support Vector Machines Previous: 1. Introduction
Journal of Machine Learning Research