## Boosted Classification Trees and Class Probability/Quantile Estimation

** David Mease, Abraham J. Wyner, Andreas Buja**; 8(Mar):409--439, 2007.

### Abstract

The standard by which binary classifiers are usually judged,
misclassification error, assumes equal costs of misclassifying the
two classes or, equivalently, classifying at the 1/2 quantile of the
conditional class probability function P[y=1|x]. Boosted
classification trees are known to perform quite well for such
problems. In this article we consider the use of standard,
off-the-shelf boosting for two more general
problems: 1) classification with unequal costs or, equivalently,
classification at quantiles other than 1/2, and 2) estimation of the
conditional class probability function *P*[*y*=1|*x*].
We first examine
whether the latter problem, estimation of *P*[*y*=1|*x*], can be solved
with LogitBoost, and with AdaBoost when combined with a natural link
function. The answer is negative: both approaches are often
ineffective because they overfit *P*[*y*=1|*x*] even though they perform
well as classifiers. A major negative point of the present article
is the disconnect between class probability estimation and
classification.

Next we consider the practice of over/under-sampling of the two
classes. We present an algorithm that uses AdaBoost in
conjunction with **O**ver/**U**nder-**S**ampling and
**J**ittering of the data "JOUS-Boost". This algorithm is
simple, yet successful, and it preserves the advantage of relative
protection against overfitting, but for arbitrary
misclassification costs and, equivalently, arbitrary quantile
boundaries. We then use collections of classifiers obtained from
a grid of quantiles to form estimators of class probabilities. The
estimates of the class probabilities compare favorably to those
obtained by a variety of methods across both simulated and real
data sets.

[abs][pdf]