4. LSVM for Nonlinear Kernels

We shall use the notation of [16]. For
and
, the ** kernel**
maps
into
.
A typical kernel is the Gaussian kernel
,
, ,
where is the base of natural logarithms, while
a linear kernel is . For a column vector in ,
is a row vector in , and the linear separating
surface (4) is replaced by the nonlinear surface

Note that the nonlinear separating surface (26) degenerates to the linear one (4) if we let and make use of (9).

To justify the nonlinear kernel formulation (27)
we refer the reader to [16, Equation (8.9)]
for a similar result and give a brief derivation here.
If we rewrite our dual problem for a linear kernel (8)
in the equivalent form:

This is the formulation given above in (27). We note that the Karush-Kuhn-Tucker necessary and sufficient optimality conditions for this problem are:

which underly LSVM for a nonlinear positive semidefinite kernel . The positive semidefiniteness of the nonlinear kernel is needed in order to ensure the existence of a solution to both (29) and (30).

All the results of the previous section remain valid, with re-defined as above for any positive semidefinite kernel . This includes the iterative scheme (15) and the convergence result given under the Algorithm 1. However, because we do not make use of the Sherman-Morrison-Woodbury identity for a nonlinear kernel, the LSVM MATLAB Code 3 is somewhat different and is as follows:

function [it, opt,u] = svmlk(nu,itmax,tol,D,KM) % lsvm with nonlinear kernel for min 1/2*u'*Q*u-e'*u s.t. u=>0 % Q=I/nu+DK(G,G')D, G=[A -e] % Input: nu, itmax, tol, D, KM=K(G,G') % [it, opt,u] = svmlk(nu,itmax,tol,D,KM); m=size(KM,1);alpha=1.9/nu;e=ones(m,1);I=speye(m);it=0; Q=I/nu+D*KM*D;P=inv(Q); u=P*e;oldu=u+1; while it<itmax & norm(oldu-u)>tol oldu=u; u=P*(1+pl(Q*u-1-alpha*u)); it=it+1; end; opt=norm(u-oldu);[it opt] function pl = pl(x); pl = (abs(x)+x)/2;

Since we cannot use the Sherman-Morrison-Woodbury identity in the general nonlinear case, we note that this code for a nonlinear kernel is effective for moderately sized problems. The sizes of the matrices and in Code 3 scale quadratically with the number of data points.