Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination

Eugene Tuv, Alexander Borisov, George Runger, Kari Torkkola; 10(Jul):1341--1366, 2009.

Abstract

Predictive models benefit from a compact, non-redundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection methods. We describe details of an algorithm using tree-based ensembles to generate a compact subset of non-redundant features. Parallel and serial ensembles of trees are combined into a mixed method that can uncover masking and detect features of secondary effect. Simulated and actual examples illustrate the effectiveness of the approach.

[abs][pdf]




Home Page

Papers

Submissions

News

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Statistics

Login

Contact Us



RSS Feed