Bagging

So far we were only concerned with single and double round robins. A natural extension to this procedure is to consider cases where more than two classifiers are trained for each binary problem. For algorithms with random components (such as RIPPER's internal split of the training examples or the random initialization of back-propagation neural networks) this could simply be performed by running the algorithm on the same dataset with different random seeds. For other algorithms there are two options: randomness could be injected into the algorithm's behavior (Dietterich, 2000b) or random subsets of the available data could be used for training the algorithm. The latter procedure is more or less equivalent to bagging (Breiman, 1996). We will evaluate this option in this section.

**Table:** *Bagging:* A comparison of round robin learning versus bagging and of the combination of both using RIPPER as the base classifier. At the bottom, the average error ratios of the ensemble techniques over the respective base learner are shown for the base learners RIPPER, C5.0, and C5.0-BOOST (we omitted the detailed results for the latter two). Note that the average performance ratios are relative to the base learner (i.e., they are only comparable within a line not between lines).
RIPPER	base	round robin		bagging		both
abalone	81.18	74.34	0.916	78.36	0.965	73.14	0.901
car	12.15	2.26	0.186	9.38	0.771	1.79	0.148
glass	34.58	25.70	0.743	29.44	0.851	25.70	0.743
image	4.29	3.46	0.808	2.51	0.586	2.99	0.697
lr spectrometer	61.39	53.11	0.865	57.82	0.942	52.92	0.862
optical	9.48	3.74	0.394	2.86	0.302	2.81	0.296
page-blocks	3.38	2.76	0.816	2.65	0.784	2.54	0.751
sat	13.04	10.35	0.794	10.18	0.781	8.92	0.684
solar flares (c)	15.91	15.77	0.991	15.91	1.000	15.69	0.986
solar flares (m)	5.47	5.04	0.921	5.26	0.961	5.18	0.947
soybean	8.79	6.30	0.717	8.20	0.933	6.00	0.683
thyroid (hyper)	1.49	1.11	0.749	1.41	0.945	1.09	0.731
thyroid (hypo)	0.56	0.53	0.955	0.58	1.050	0.42	0.764
thyroid (repl.)	0.98	1.01	1.026	0.98	0.999	0.85	0.864
vehicle	30.38	29.08	0.957	26.12	0.860	26.83	0.883
vowel	27.07	18.69	0.690	16.26	0.601	18.79	0.694
yeast	42.39	41.78	0.986	40.63	0.959	39.89	0.941
average (RIPPER)			0.747		0.811		0.685
average (C5.0)			0.909		0.864		0.838
average (C5.0-BOOST)			1.029		0.977		1.019

Table 5 shows the results of a comparison of round robin learning, bagging, and a combination of both. Bagging was implemented by drawing 10 samples with replacement from the available data. Ties were broken in the same way as for the round robin binarization, i.e., by simple voting using the a priori class probability as a tie breaker. Similarly, bagging was integrated with round robin binarization by drawing 10 independent samples of each pairwise classification problem. Thus we obtained a total of 10c(c-1) predictions for each c-class problem, which again were simply voted. The number of 10 iterations was chosen arbitrarily (to conform to C5.0-BOOST's default number of iterations) and is certainly not optimal (in both cases).

The results show clearly that the performance of the simple round robin (second column) can be improved considerably by integrating it with bagging (last column). The bagged round robin procedure reduces RIPPER's error on the datasets to about 68.5% of the original error (third line from the bottom). For comparison, we also show the results of bagging only, which seems to give the least improvement. The results of bagging are not included to compare it to the round robin, but to show that the reduction in error rate for the bagged round robin robin outperforms both its constituents.

We also repeated these experiments using C5.0 and C5.0-BOOST as the base learners. We only show the average performance for these cases. Again, the advantage of the use of round robin learning is less pronounced for the multi-class learner C5.0 (it is even below the improvement given by our simple bagging procedure), and the combination of C5.0-BOOST and round robin learning does not produce an additional gain. It is worth mentioning that the combination of boosting and bagging outperforms boosting, which confirms previous good results with such algorithms (Krieger et al., 2001; Pfahringer, 2000).

In order to compare the absolute performances of the algorithms we can normalize all relative results on the performance of one algorithm (say RIPPER). C5.0's performance was better than RIPPER's by a factor of about 0.891. Multiplying this with the improvement of 0.735 of boosting (Table 4) and of an additional 0.977 for adding bagging (Table 5) yields that bagged C5.0-BOOST has about 64% of the error rate of basic RIPPER, which makes it the best performing combination. In comparison, the combination of round robin and bagging for RIPPER (68.5%) is relatively close behind, in particular if we consider the bad performance of RIPPER in comparison to C5.0. An evaluation of a boosting variant of RIPPER (such as SLIPPER; Cohen and Singer, 1999) would be of interest.