Classification Efficiency:

Next: Comprehensibility: Up: Other Properties and Open Previous: Decoding: Contents

Classification Efficiency:

Our efficiency analysis is only concerned with training time. At testing time, one has to test a quadratic number of classifiers in order to make the final prediction. Even though the constituent classifiers are quite likely to be simpler (which often means that they can make faster predictions) it can be expected that classification takes considerably longer for a round robin ensemble than for unordered binarization. This situation is particularly bad for lazy learners, which defer most of their training effort to the classification phase.

Next to the straightforward solution of testing all theories in parallel (see below), a solution for this problem could be found in the proposal of Platt et al. (2000) who suggest organizing pairwise classifiers in a decision graph where each node represents a binary classifier. They show that this structure allows obtainment of a prediction for a c-class problem by consulting only c-1 pairwise classifiers without loss of accuracy on three benchmarks problems. In some sense, this technique may be viewed as the ``ordered'' version of round robin classification.

Finally, we could once more take a look at sports and game tournaments, where elaborate pairing schemes allow determination of a reliable ranking in cases where a round robin tournament is infeasible due to the high number of players. The frequently used knock-out tournament (where players are paired randomly and only the winner advances into the next round) is probably too brittle. An interesting alternative might be provided by swiss system tournaments, which are frequently used in competitive chess. In this type of tournament, all players play a fixed number of rounds, typically of the order $\log(c)$ . The trick is that in each round players of about equal strength (i.e., players that are ranked close to each other in the current standings of the tournament) are paired against each other. Such schemes could improve classification time for problems with very high numbers of classes, in particular if classification is very expensive (e.g., in the case of lazy learners).

Next: Comprehensibility: Up: Other Properties and Open Previous: Decoding: Contents

Johannes Fürnkranz 2002-03-11