next up previous
Next: Large Datasets Up: Experimental Results Previous: Working Set Size

Small Datasets

Let us now compare SVMTorch and Nodelib on small datasets. In the results given in TABLE 3, only the first 5000 training examples were used. The size of the cache was set to 300Mb, so that the whole kernel matrix could be kept in memory. For each problem, we also computed the output median (which minimizes the mean absolute error (MAE) without any information about the inputs) on the training set and computed the performance of this median over the training and the test sets in order to have a better idea of the performance of the various algorithms.


 
Table 3: Experiments on small training sets. SVMTorchN = SVMTorch without shrinking, SVMTorchU = SVMTorch with shrinking and unshrinking, Time NSP = time (in seconds) for non-sparse data format, Time SP = time (in seconds) for sparse data format, # SV = number of support vectors, Objective Function = value of (3) at the end of the optimization, Model Train = mean absolute error (MAE) over the training set, Model Test = MAE over the test set, Median Train = MAE over the training set with the median as predictor, Median Test = MAE over the test set with the median as predictor.
Dataset Model Time   Objective Model Median
    NSP SP # SV Function Train Test Train Test
  SVMTorch 7 - 936 -173977.85 0.30 0.31    
Kin SVMTorchU 15 - 936 -173977.85 0.30 0.31 0.37 0.38
  SVMTorchN 45 - 941 -173982.65 0.30 0.31    
  Nodelib 157 - 932 -174019.67 0.30 0.31    
  SVMTorch 31 - 342 -13594.01 0.25 0.52    
Artificial SVMTorchU 166 - 367 -13703.07 0.24 0.51 25.16 27.95
  SVMTorchN 448 - 370 -13701.41 0.24 0.51    
  Nodelib 231 - 342 -13707.29 0.24 0.51    
  SVMTorch 21 21 993 -1012.46 0.51 0.80    
Forest SVMTorchU 65 43 1051 -1030.78 0.41 0.80 0.38 1.59
  SVMTorchN 110 115 1058 -1030.43 0.41 0.80    
  Nodelib 542 - 1032 -1031.54 0.41 0.80    
  SVMTorch 2 - 420 -3489571.13 9.65 10.12    
Sunspots SVMTorchU 9 - 422 -3489630.53 9.65 10.11 33.02 52.58
  SVMTorchN 38 - 422 -3489628.27 9.64 10.11    
  Nodelib 327 - 422 -3489630.65 9.64 10.11    
  SVMTorch 118 79 1861 -190.77 0.39 0.48    
MNIST SVMTorchU 147 98 1861 -190.77 0.39 0.48 0.98 0.97
  SVMTorchN 152 104 1861 -190.77 0.39 0.48    
  Nodelib 2216 - 1878 -190.80 0.39 0.48    
 

As can be seen, for all the datasets, SVMTorch is usually many times faster than Nodelib (except for Artificial in the case of SVMTorch without shrinking). Since the whole matrix of the quadratic problem was in memory, handling of the cache had no effect on the speed results. Thus one can conclude that one of the main differences between these algorithms is the selection of the subproblem, and that selecting the $\alpha_i$ independently of the $\alpha_i^{\star}$ is very efficient. However, Nodelib gave slightly better results in terms of the objective function (probably due to the fact that their termination criterion is stronger than ours; see [1] for a comparison between termination criteria), but not in terms of test error. Note that for problems of this size, shrinking does not cause the performance to deteriorate too much (check the values of the objective function as well as the training and test set performances) but does speed the algorithm up a lot. Note also that using the sparse format was not good in the case of Forest but was good for MNIST, which has a higher input dimension: this means that the cost induced by the use of sparse format is balanced by the gain obtained in the kernel computation only when the input size is large enough.


next up previous
Next: Large Datasets Up: Experimental Results Previous: Working Set Size
Journal of Machine Learning Research