next up previous
Next: Working Set Size Up: SVMTorch: Support Vector Machines Previous: Convergence

   
Experimental Results

We compared our SVM implementation for regression problems (SVMTorch) to the one from Flake and Lawrence [3] using their publicly available software Nodelib. This is interesting because Nodelib is based on SMO where the variables $\alpha_i$ and $\alpha_i^{\star}$ are selected simultaneously, which is not the case for SVMTorch. Note also that Nodelib includes some enhancements compared to SMO which are different from those proposed by Shevade et al [13].

Both these algorithms use an internal cache in order to be able to solve large-scale problems. All the experiments presented here have been done on a LINUX Pentium III 750Mhz, with the gcc compiler. The parameters of the algorithms were not chosen to obtain the best generalization performances, since the goal was to compare the speed of the algorithms. However, we have chosen them in order to obtain reasonable results. Both programs used the same parameters with regard to cache, precision, etc. For Nodelib, the other parameters were set using the default values proposed by the authors.6 All the programs were compiled using double precision. We compared the programs on five different tasks :

Kin
This dataset7 represents a realistic simulation of the forward dynamics of an 8 link all-revolute robot arm. The task is to predict the distance of the end-effector from a target, given features like joint positions, twist angles, etc.

Sunspots
Using a series representing the number of sunspots per day, we created one input/output pair for each day: the yearly average of the year starting the next day had to be predicted using the 12 previous yearly averages.
Artificial
This is an easy artificial dataset based on Sunspots: we create a daily series where each value is the yearly average centered on that day. The task is to predict a value given the 100 previous ones.
Forest
This dataset8 is a classification task with 7 classes, where only the first 35000 examples were used here. We transformed it into a regression task where the goal was to predict +1 for examples of class 2 and -1for the other examples. Note that since class 2 was over-represented in the dataset, this transformation leads to a more balanced problem.
MNIST
This dataset9 is a classification task with 10 classes (representing handwritten digits). Again, we transformed it in a regression problem where the goal was to predict +1 for examples of classes 0 to 4 and -1 for the other examples.

Note also that Forest and MNIST are respectively $78\%$ and $81\%$ sparse (contain respectively $78\%$ and $81\%$ null values in their input matrices). Since SVMTorch can handle sparse data (as can SVM-Light), we tested this option in the experiments described in TABLES 3 and 4. The parameters used to train the datasets can be found in TABLE 1. Note that all experiments used a Gaussian kernel10 and a value of C = 1000, and the termination criterion was the verified KKT conditions with a precision of 0.01.


 
Table: Parameters used for each dataset: # Train = maximum number of training examples, taken at the begining of the dataset, # Test = number of test examples, taken just following the training set examples, Dim = input dimension, $\sigma $ = parameter of the Gaussian kernel, $\epsilon $ = value used in the $\epsilon $-insensitive loss function.
  # Train # Test Dim $\sigma $ $\epsilon $
Kin 6192 2000 32 100 0.5
Artificial 20000 2000 100 100 0.5
Forest 25000 10000 54 400 0.7
Sunspots 40000 2500 12 900 20
MNIST 60000 10000 784 1650 0.5
 

For the experiments involving SVMTorch, we have tested a version with shrinking but without verifying at the end of the optimization whether all the suppressed variables verified the KKT conditions (SVMTorch), with no shrinking (SVMTorchN), and a version with shrinking and verification at the end of the optimization, as done in SVM-Light (SVMTorchU). As it will be seen in the results, the first method has a big speedup advantage, but only a small negative impact on the generalization performance in general. However, sometimes the default value of 100 iterations before a variable is removed by shrinking must be changed to obtain the correct solution.



 
next up previous
Next: Working Set Size Up: SVMTorch: Support Vector Machines Previous: Convergence
Journal of Machine Learning Research