next up previous contents
Next: Efficiency Up: Round Robin Classification Previous: Round Robin Classification   Contents


Accuracy

In this section, we will briefly present an experimental evaluation of round robin binarization in a rule learning context. We chose RIPPER (Cohen, 1995) as the base classifier, which--in our view--is the most advanced algorithm of the family of separate-and-conquer (or covering) rule learning algorithms (Fürnkranz, 1997; Fürnkranz, 1999b).

The unordered and ordered binarization procedures were used as implemented within RIPPER. Ordered binarization is RIPPER's default mode, and unordered binarization can be invoked using the option -a unordered. The round robin binarization was implemented as a wrapper around RIPPER, which provided it with the appropriate training sets. The wrapper was implemented in perl and had to communicate with RIPPER by writing the training sets to and reading RIPPER's results from the disk. This implementation is referred to as R3 (round robin ripper).

12pt In principle, RIPPER is a class-symmetric learner. It will treat the larger class of a two-class problem as the default class and learn rules for the smaller class. Although this procedure is class-symmetric (problem <i,j> is converted to <j,i> if $ \vert c_i\vert > \vert c_j\vert$), we felt that it would not be fair. For example, the largest class in the multi-class problem would be used as the default class in all round robin problems. This may be an unfair advantage (or disadvantage) to this class.3 For this reason, R3 implements a double round robin binarization which calls RIPPER with the option -a given on each binary problem <i,j> (i,j= 1 ...c, i$ \neq$ j). This instructs RIPPER to use the classes in the order specified by the user (i.e., the order in which they appear in the names file). Hence, <i,j> and <j,i> are two different problems, which ensures that each class is the default class in exactly half of its binary classification problems. Note, that this procedure is basically identical to the one that is employed by RIPPER if it is used in unordered mode on a two-class problem except that RIPPER would tie-break immediately between the theories learned for <i,j> and <j,i>, while we first collect all votes from all c(c-1) binary problems.


Table: Error rates: The first two column pairs show the results of RIPPER (in unordered and in default, ordered mode). For both, we show the absolute error rate, and the improvement rate of R3 over the algorithm. R3's error rates are shown in the 5th column. The last column shows whether the difference between R3 and default RIPPER (ordered) is significant (++ if p > 0.99, + if p > 0.95, McNemar test). The first part of the table shows the results for cross-validation (17 sets). To allow a comparison with previous works, the last 6 lines show hold-out estimates for the six datasets with a designated test set (3 datasets occur in both blocks). The line in the middle shows the geometric averages of the improvement ratios.
RIPPER R3 McNemar
dataset unord. ratio ordered ratio test
abalone 81.64 0.911 81.18 0.916 74.34 ++
car 5.79 0.390 12.15 0.186 2.26 ++
glass 35.51 0.724 34.58 0.743 25.70 ++
image 4.15 0.823 4.29 0.808 3.46 +
lr spectrometer 64.22 0.827 61.39 0.865 53.11 ++
optical 7.79 0.479 9.48 0.394 3.74 ++
page-blocks 2.85 0.968 3.38 0.816 2.76 ++
sat 13.18 0.785 13.04 0.794 10.35 ++
solar flares (c) 15.91 0.991 15.91 0.991 15.77 =
solar flares (m) 4.90 1.029 5.47 0.921 5.04 =
soybean 8.79 0.717 8.79 0.717 6.30 ++
thyroid (hyper) 1.25 0.893 1.49 0.749 1.11 +
thyroid (hypo) 0.64 0.833 0.56 0.955 0.53 =
thyroid (repl.) 1.17 0.863 0.98 1.026 1.01 =
vehicle 28.25 1.029 30.38 0.957 29.08 =
vowel 30.50 0.633 27.07 0.690 18.69 ++
yeast 44.00 0.949 42.39 0.986 41.78 =
average 0.787 0.747
abalone 81.03 0.901 82.18 0.888 72.99 ++
covertype 35.37 0.939 38.50 0.862 33.20 ++
letter 15.22 0.516 15.75 0.498 7.85 ++
sat 14.25 0.782 17.05 0.654 11.15 ++
shuttle 0.03 0.667 0.06 0.375 0.02 =
vowel 64.94 0.823 53.25 1.004 53.46 =

Table 1 shows the 20 datasets we used in this study. They were chosen arbitrarily among datasets with $ \geq$ 4 classes available at the UCI repository (Blake and Merz, 1998).4 The implementation of the algorithm was developed independently and not tuned on these datasets. On the six sets with a dedicated test set, we report the error rate on these test sets. On the other 14 sets, we estimated the error rate using paired, stratified 10-fold cross-validations. For abalone, sat and vowel we performed both a test set evaluation and a cross-validation.5

Table 2 shows the accuracies of RIPPER (unordered and ordered) and R3 on the selected datasets. On half of the 20 experiments (not counting the cross-validated trials of the three sets the re-appear at the bottom), R3 is significantly better (p > 0.99 on a McNemar test6) than RIPPER's default mode (ordered binarization). There were only two experiments (thyroid (repl.) and the test-set version of vowel), where R3 is worse than RIPPER, both differences being insignificant. On the 17 cross-validated problems, R3 reduces the average error to 75% of the error of RIPPER's error.7 The comparison to unordered RIPPER is similar (the significance levels for this case are not shown).

We can safely conclude that round robin binarization may result in significant improvements over ordered or unordered binarization without having a high risk of decreasing performance.


next up previous contents
Next: Efficiency Up: Round Robin Classification Previous: Round Robin Classification   Contents
Johannes Fürnkranz 2002-03-11