next up previous
Next: 4. LSVM for Nonlinear Up: Lagrangian Support Vector Machines Previous: 2. The Linear Support

3. LSVM (Lagrangian Support Vector Machine) Algorithm

Before stating our algorithm we define two matrices to simplify notation as follows:

\end{displaymath} (10)

With these definitions the dual problem (8) becomes
\displaystyle{\min_{0\le u\in R^m}}f(u):=\frac{1}{2}u'Qu-e'u.
\end{displaymath} (11)

It will be understood that within the LSVM Algorithm, the single time that $Q^{-1}$ is computed at the outset of the algorithm, the SMW identity (1) will be used. Hence only an $(n+1)\times (n+1)$ matrix is inverted.

The LSVM Algorithm is based directly on the Karush-Kuhn-Tucker necessary and sufficient optimality conditions [15, KTP 7.2.4, page 94] for the dual problem (11):

0\le u\;\perp\;Qu-e\ge 0.
\end{displaymath} (12)

By using the easily established identity between any two real numbers (or vectors) $a$ and $b$:
0\le a\;\perp\;b\ge 0\; \Longleftrightarrow\; a=(a-\alpha b)_+,\;\alpha>0,
\end{displaymath} (13)

the optimality condition (12) can be written in the following equivalent form for any positive $\alpha$:
Qu-e=((Qu-e)-\alpha u)_+.
\end{displaymath} (14)

These optimality conditions lead to the following very simple iterative scheme which constitutes our LSVM Algorithm:
u^{i+1}=Q^{-1}(e+((Qu^i-e)-\alpha u^i)_+),\;i=0,1,\ldots,
\end{displaymath} (15)

for which we will establish global linear convergence from any starting point under the easily satisfiable condition:
\end{displaymath} (16)

We impose this condition as $\alpha=1.9/\nu$ in all our experiments, where $\nu$ is the parameter of our SVM formulation (7). It turns out, and this is the way that led us to this iterative scheme, that the optimality condition (14), is also the necessary and sufficient condition for the unconstrained minimum of the implicit Lagrangian [21] associated with the dual problem (11):
\displaystyle{\min_{u\in R^m} L(u,\alpha)=\min_{u\in R^m}\;\...
...{2\alpha} (\Vert(-\alpha u+Qu-e)_+\Vert^2-\Vert Qu-e\Vert^2)}.
\end{displaymath} (17)

Setting the gradient with respect to $u$ of this convex and differentiable Lagrangian to zero gives
(Qu-e)+\frac{1}{\alpha}(Q-\alpha I)((Q-\alpha I)u-e)_+-\frac{1}{\alpha}Q(Qu-e)=0,
\end{displaymath} (18)

or equivalently:
(\alpha I -Q)((Qu-e)-((Q-\alpha I)u-e)_+)=0,
\end{displaymath} (19)

which is equivalent to the optimality condition (14) under the assumption that $\alpha$ is positive and not an eigenvalue of $Q$.

We establish now the global linear convergence of the iteration (15) under condition (16).

Algorithm 1   LSVM Algorithm & Its Global Convergence Let $Q\in R^{m\times m}$ be the symmetric positive definite matrix defined by (10) and let (16) hold. Starting with an arbitrary $u^0\in R^m$, the iterates $u^i$ of (15) converge to the unique solution $\bar u$ of (11) at the linear rate:
\Vert Qu^{i+1}-Q\bar u\Vert\le \Vert I-\alpha Q^{-1}\Vert\cdot \Vert Qu^i-Q\bar u\Vert.
\end{displaymath} (20)

Proof Because $\bar u$ is the solution of (11), it must satisfy the optimality condition (14) for any $\alpha>0$. Subtracting that equation with $u=\bar u$ from the iteration (15) premultiplied by $Q$ and taking norms gives:
\Vert Qu^{i+1}-Q\bar u\Vert=\Vert(Qu^i-e-\alpha u^i)_+-(Q\bar u-e-\alpha\bar u)_+\Vert
\end{displaymath} (21)

Using the Projection Theorem [1, Proposition 2.1.3] which states that the distance between any two points in $R^m$ is not less than the distance between their projections on any convex set (the nonnegative orthant here) in $R^m$, the above equation gives:
\Vert Qu^{i+1}-Q\bar u\Vert&\le& \Vert(Q-...
...I-\alpha Q^{-1}\Vert\cdot\Vert Q(u^i-\bar u)\Vert.
\end{array}\end{displaymath} (22)

All we need to show now is that $ \Vert I-\alpha Q^{-1}\Vert<1$. This follows from (16) as follows. Noting the definition (10) of $Q$ and letting $\lambda_i$, $i=1,\ldots,m$, denote the nonnegative eigenvalues of $HH'$, all we need is:
\end{displaymath} (23)

or equivalently:
\end{displaymath} (24)

which is satisfied under the assumption (16).$\Box$

We give now a complete MATLAB [26] code of LSVM which is capable of solving problems with millions of points using only native MATLAB commands. The input parameters, besides $A$, $D$ and $\nu$ of (10), which define the problem, are: itmax, the maximum number of iterations and tol, the tolerated nonzero error in $\Vert u^{i+1}-u^i\Vert$ at termination. The quantity $\Vert u^{i+1}-u^i\Vert$ bounds from above:

\Vert Q\Vert^{-1} \cdot \Vert Qu^i - e - ((Qu^i-e)-\alpha u^i)_+ \Vert,
\end{displaymath} (25)

which measures the violation of the optimality criterion (14). It follows [20] that $\Vert u^{i+1}-u^i\Vert$ also bounds $\Vert u^{i}-\bar u\Vert$, and by (9) it also bounds $\Vert w^i-\bar w\Vert$ and $\vert\gamma^i-\bar \gamma\vert$, where $(\bar w,\bar \gamma, \bar y)$ is the unique solution of the primal SVM (7).

Code 2   LSVM MATLAB M-File

function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol)
% lsvm with SMW for min 1/2*u'*Q*u-e'*u s.t. u=>0,
% Q=I/nu+H*H', H=D[A -e]
% Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma
% [it, opt, w, gamma] = svml(A,D,nu,itmax,tol);
  [m,n]=size(A);alpha=1.9/nu;e=ones(m,1);H=D*[A -e];it=0;
  while it<itmax & norm(oldu-u)>tol

function pl = pl(x); pl = (abs(x)+x)/2;

next up previous
Next: 4. LSVM for Nonlinear Up: Lagrangian Support Vector Machines Previous: 2. The Linear Support
Journal of Machine Learning Research