## Confidence Sets with Expected Sizes for Multiclass Classification

*Christophe Denis, Mohamed Hebiri*; 18(102):1−28, 2017.

### Abstract

Multiclass classification problems such as image annotation can
involve a large number of classes. In this context, confusion
between classes can occur, and single label classification may
be misleading. We provide in the present paper a general device
that, given an unlabeled dataset and a score function defined as
the minimizer of some empirical and convex risk, outputs a set
of class labels, instead of a single one. Interestingly, this
procedure does not require that the unlabeled dataset explores
the whole classes. Even more, the method is calibrated to
control the expected size of the output set while minimizing the
classification risk. We show the statistical optimality of the
procedure and establish rates of convergence under the Tsybakov
margin condition. It turns out that these rates are linear on
the number of labels. We apply our methodology to convex
aggregation of confidence sets based on the $V$-fold cross
validation principle also known as the superlearning principle
(van der Laan et al., 2007). We illustrate the numerical
performance of the procedure on real data and demonstrate in
particular that with moderate expected size, w.r.t. the number
of labels, the procedure provides significant improvement of the
classification risk.

[abs][pdf][bib]