Experimental Results

We compared our SVM implementation for regression problems (*SVMTorch*)
to the one from Flake and Lawrence [3] using
their publicly available software *Nodelib*. This is interesting
because *Nodelib* is based on *SMO* where the
variables
and
are selected simultaneously,
which is not the case for *SVMTorch*. Note also that
*Nodelib* includes some enhancements compared to *SMO* which are
different from those proposed by Shevade *et al* [13].

Both these algorithms use an internal cache in order to be able to
solve large-scale problems.
All the experiments presented here have been done on a
LINUX Pentium III 750Mhz, with the *gcc* compiler.
The parameters of the algorithms were not chosen to obtain the best
generalization performances, since the goal was to *compare the
speed* of the algorithms. However, we have chosen them in order to obtain
reasonable results.
Both programs used the same parameters with regard to cache, precision,
etc. For *Nodelib*, the other parameters were set using the
default values proposed
by the authors.^{6}
All the programs were compiled using *double*
precision.
We compared the programs on five different tasks :

**Kin**- This dataset
^{7}represents a realistic simulation of the forward dynamics of an 8 link all-revolute robot arm. The task is to predict the distance of the end-effector from a target, given features like joint positions, twist angles, etc. **Sunspots**- Using a series representing the number of sunspots per day, we created one input/output pair for each day: the yearly average of the year starting the next day had to be predicted using the 12 previous yearly averages.
**Artificial**- This is an easy artificial dataset based on
**Sunspots**: we create a daily series where each value is the yearly average centered on that day. The task is to predict a value given the 100 previous ones. **Forest**- This dataset
^{8}is a classification task with 7 classes, where only the first 35000 examples were used here. We transformed it into a regression task where the goal was to predict +1 for examples of class 2 and -1for the other examples. Note that since class 2 was over-represented in the dataset, this transformation leads to a more balanced problem. **MNIST**- This dataset
^{9}is a classification task with 10 classes (representing handwritten digits). Again, we transformed it in a regression problem where the goal was to predict +1 for examples of classes 0 to 4 and -1 for the other examples.

Note also that *Forest* and *MNIST* are respectively
and
sparse (contain respectively
and
null values
in their input matrices).
Since *SVMTorch* can handle sparse data (as can *SVM-Light*), we
tested this option in the experiments described in
TABLES 3 and 4.
The parameters used to train the datasets can be found in
TABLE 1.
Note that all experiments used a Gaussian kernel^{10}
and a value of *C* = 1000, and the termination criterion was the verified
KKT conditions with a precision of 0.01.

For the experiments involving *SVMTorch*, we have tested a version
with shrinking but without verifying at the end of the optimization
whether all the
suppressed variables verified the KKT conditions (*SVMTorch*),
with no shrinking (*SVMTorch*N), and a version with shrinking
and verification at the end of the optimization, as done in *SVM-Light*
(*SVMTorch*U).
As it will be seen in the results, the first method has a big speedup
advantage, but
only a small negative impact on the generalization performance
*in general*. However, sometimes the default value of 100 iterations
before a variable is removed by shrinking must be changed to obtain the correct
solution.

- Working Set Size
- Small Datasets
- Large Datasets
- Size of the Cache
- Scaling with Respect to the Size of the Training Set