So far we were only concerned with single and double round robins. A natural extension to this procedure is to consider cases where more than two classifiers are trained for each binary problem. For algorithms with random components (such as RIPPER's internal split of the training examples or the random initialization of back-propagation neural networks) this could simply be performed by running the algorithm on the same dataset with different random seeds. For other algorithms there are two options: randomness could be injected into the algorithm's behavior (Dietterich, 2000b) or random subsets of the available data could be used for training the algorithm. The latter procedure is more or less equivalent to bagging (Breiman, 1996). We will evaluate this option in this section.
|solar flares (c)||15.91||15.77||0.991||15.91||1.000||15.69||0.986|
|solar flares (m)||5.47||5.04||0.921||5.26||0.961||5.18||0.947|
Table 5 shows the results of a comparison of round robin learning, bagging, and a combination of both. Bagging was implemented by drawing 10 samples with replacement from the available data. Ties were broken in the same way as for the round robin binarization, i.e., by simple voting using the a priori class probability as a tie breaker. Similarly, bagging was integrated with round robin binarization by drawing 10 independent samples of each pairwise classification problem. Thus we obtained a total of 10c(c-1) predictions for each c-class problem, which again were simply voted. The number of 10 iterations was chosen arbitrarily (to conform to C5.0-BOOST's default number of iterations) and is certainly not optimal (in both cases).
The results show clearly that the performance of the simple round robin (second column) can be improved considerably by integrating it with bagging (last column). The bagged round robin procedure reduces RIPPER's error on the datasets to about 68.5% of the original error (third line from the bottom). For comparison, we also show the results of bagging only, which seems to give the least improvement. The results of bagging are not included to compare it to the round robin, but to show that the reduction in error rate for the bagged round robin robin outperforms both its constituents.
We also repeated these experiments using C5.0 and C5.0-BOOST as the base learners. We only show the average performance for these cases. Again, the advantage of the use of round robin learning is less pronounced for the multi-class learner C5.0 (it is even below the improvement given by our simple bagging procedure), and the combination of C5.0-BOOST and round robin learning does not produce an additional gain. It is worth mentioning that the combination of boosting and bagging outperforms boosting, which confirms previous good results with such algorithms (Krieger et al., 2001; Pfahringer, 2000).
In order to compare the absolute performances of the algorithms we can normalize all relative results on the performance of one algorithm (say RIPPER). C5.0's performance was better than RIPPER's by a factor of about 0.891. Multiplying this with the improvement of 0.735 of boosting (Table 4) and of an additional 0.977 for adding bagging (Table 5) yields that bagged C5.0-BOOST has about 64% of the error rate of basic RIPPER, which makes it the best performing combination. In comparison, the combination of round robin and bagging for RIPPER (68.5%) is relatively close behind, in particular if we consider the bad performance of RIPPER in comparison to C5.0. An evaluation of a boosting variant of RIPPER (such as SLIPPER; Cohen and Singer, 1999) would be of interest.