Fast Binary Feature Selection with Conditional Mutual Information

François Fleuret; 5(Nov):1531--1555, 2004.


We propose in this paper a very fast feature selection technique based on conditional mutual information. By picking features which maximize their mutual information with the class to predict conditional to any feature already picked, it ensures the selection of features which are both individually informative and two-by-two weakly dependant. We show that this feature selection method outperforms other classical algorithms, and that a naive Bayesian classifier built with features selected that way achieves error rates similar to those of state-of-the-art methods such as boosting or SVMs. The implementation we propose selects 50 features among 40,000, based on a training set of 500 examples in a tenth of a second on a standard 1Ghz PC.