Considering Cost Asymmetry in Learning Classifiers
Francis R. Bach, David Heckerman, Eric Horvitz; 7(Aug):1713--1741, 2006.
AbstractReceiver Operating Characteristic (ROC) curves are a standard way to display the performance of a set of binary classifiers for all feasible ratios of the costs associated with false positives and false negatives. For linear classifiers, the set of classifiers is typically obtained by training once, holding constant the estimated slope and then varying the intercept to obtain a parameterized set of classifiers whose performances can be plotted in the ROC plane. We consider the alternative of varying the asymmetry of the cost function used for training. We show that the ROC curve obtained by varying both the intercept and the asymmetry, and hence the slope, always outperforms the ROC curve obtained by varying only the intercept. In addition, we present a path-following algorithm for the support vector machine (SVM) that can compute efficiently the entire ROC curve, and that has the same computational complexity as training a single classifier. Finally, we provide a theoretical analysis of the relationship between the asymmetric cost model assumed when training a classifier and the cost model assumed in applying the classifier. In particular, we show that the mismatch between the step function used for testing and its convex upper bounds, usually used for training, leads to a provable and quantifiable difference around extreme asymmetries.